A Novel Sentence Transformer-based Natural Language Processing Approach for Schema Mapping of Electronic Health Records to the OMOP Common Data Model
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Deep Learning, Large Language Models (LLMs), Natural Language Processing, Machine Learning
Working Group: Natural Language Processing Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Mapping electronic health records (EHR) data to common data models (CDMs) enables the standardization of clinical records, enhancing interoperability and enabling large-scale, multi-centered clinical investigations. Using 2 large publicly available datasets, we developed transformer-based natural language processing models to map medication-related concepts from the EHR at a large and diverse healthcare system to standard concepts in OMOP CDM. We validated the model outputs against standard concepts manually mapped by clinicians. Our best model reached out-of-box accuracies of 96.5% in mapping the 200 most common drugs and 83.0% in mapping 200 random drugs in the EHR. For these tasks, this model outperformed a state-of-the-art large language model (SFR-Embedding-Mistral, 89.5% and 66.5% in accuracy for the two tasks), a widely-used software for schema mapping (Usagi, 90.0% and 70.0% in accuracy), and direct string match (7.5% and 7.5% accuracy). Transformer-based deep learning models outperform existing approaches in the standardized mapping of EHR elements and can facilitate an end-to-end automated EHR transformation pipeline.
Speaker(s):
Xinyu Zhou
Yale University
Author(s):
Xinyu Zhou - Yale University; Lovedeep S Dhingra, MBBS - Yale University; Arya Aminorroaya, MD, MPH - Yale University; Philip Adejumo, BS - Yale University; Rohan Khera, MD, MS;
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Deep Learning, Large Language Models (LLMs), Natural Language Processing, Machine Learning
Working Group: Natural Language Processing Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Mapping electronic health records (EHR) data to common data models (CDMs) enables the standardization of clinical records, enhancing interoperability and enabling large-scale, multi-centered clinical investigations. Using 2 large publicly available datasets, we developed transformer-based natural language processing models to map medication-related concepts from the EHR at a large and diverse healthcare system to standard concepts in OMOP CDM. We validated the model outputs against standard concepts manually mapped by clinicians. Our best model reached out-of-box accuracies of 96.5% in mapping the 200 most common drugs and 83.0% in mapping 200 random drugs in the EHR. For these tasks, this model outperformed a state-of-the-art large language model (SFR-Embedding-Mistral, 89.5% and 66.5% in accuracy for the two tasks), a widely-used software for schema mapping (Usagi, 90.0% and 70.0% in accuracy), and direct string match (7.5% and 7.5% accuracy). Transformer-based deep learning models outperform existing approaches in the standardized mapping of EHR elements and can facilitate an end-to-end automated EHR transformation pipeline.
Speaker(s):
Xinyu Zhou
Yale University
Author(s):
Xinyu Zhou - Yale University; Lovedeep S Dhingra, MBBS - Yale University; Arya Aminorroaya, MD, MPH - Yale University; Philip Adejumo, BS - Yale University; Rohan Khera, MD, MS;
A Novel Sentence Transformer-based Natural Language Processing Approach for Schema Mapping of Electronic Health Records to the OMOP Common Data Model
Category
Paper - Regular
Description
Date: Wednesday (11/13)
Time: 10:15 AM to 10:30 AM
Room: Continental Ballroom 1-2
Time: 10:15 AM to 10:30 AM
Room: Continental Ballroom 1-2