Times are displayed in (UTC-07:00) Pacific Time (US & Canada) Change
11/13/2024 |
9:45 AM – 11:00 AM |
Continental Ballroom 1-2
S114: Ontologies and Data Models - This IS-A Topic
Presentation Type: Oral
Session Chair:
Nicholas Anderson, PhD - University of California, Davis
Converting OMOP CDM to Phenopackets: A Model Alignment and Patient Data Representation Evaluation
Presentation Time: 09:45 AM - 10:00 AM
Abstract Keywords: Data Transformation/ETL, Interoperability and Health Information Exchange, Data Sharing, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
The study developed a data transformation process from OMOP CDM to Phenopackets, an emerging data standard designed for multimodal patient data storage and exchange. We evaluated transformations using real-world data, and incorporated UMLS semantic type filtering to reconcile ambiguous model alignment. We further evaluated Phenopackets’ suitability in representing real-world clinical cases. The data model conversion bridges OMOP’s large-scale research capabilities with Phenopackets’ support for biomedical knowledge integration via ontologies and capacity for point-of-care deployment.
Speaker(s):
Kayla Schiffer-Kane, MA
Columbia University
Author(s):
Chunhua Weng, PhD - Columbia University; Cong Liu, PhD - Columbia University; Casey Ta - Columbia University Dept of Biomedical Informatics; Jordan Nestor, MD, MS - Columbia University; Tiffany Callahan, MPH, PhD - Columbia University Irving Medical Center;
Presentation Time: 09:45 AM - 10:00 AM
Abstract Keywords: Data Transformation/ETL, Interoperability and Health Information Exchange, Data Sharing, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
The study developed a data transformation process from OMOP CDM to Phenopackets, an emerging data standard designed for multimodal patient data storage and exchange. We evaluated transformations using real-world data, and incorporated UMLS semantic type filtering to reconcile ambiguous model alignment. We further evaluated Phenopackets’ suitability in representing real-world clinical cases. The data model conversion bridges OMOP’s large-scale research capabilities with Phenopackets’ support for biomedical knowledge integration via ontologies and capacity for point-of-care deployment.
Speaker(s):
Kayla Schiffer-Kane, MA
Columbia University
Author(s):
Chunhua Weng, PhD - Columbia University; Cong Liu, PhD - Columbia University; Casey Ta - Columbia University Dept of Biomedical Informatics; Jordan Nestor, MD, MS - Columbia University; Tiffany Callahan, MPH, PhD - Columbia University Irving Medical Center;
Optimizing Medication Querying Using Ontology-Driven Approach with OMOP: with an application to a large-scale COVID-19 EHR dataset
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Information Extraction, Clinical Decision Support, Information Visualization, Usability
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Efficient medication querying in Electronic Health Record (EHR) datasets is crucial for effective patient care and clinical research. However, the complexity and volume of such datasets present significant challenges in extracting relevant medication information accurately. In this study, we propose an ontology-driven medication query optimization approach, named ODMQ, leveraging the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to enhance medication querying capabilities. By integrating semantic ontology structures of OMOP CDM, our method provides a simpler and more convenient way to obtain a comprehensive list of drug names, National Drug Codes, and generic names. This enhancement reduces the time required for clinical researchers to manually search for medication information and improves query capability. We validate the efficacy and scalability of our methodology by conducting evaluations and experiments on an extensive real-world COVID-19 EHR dataset. The experimental results demonstrate that ODMQ can effectively improve medication query outcomes. Through a comprehensive manual review of all expansion results, ODMQ not only covers the medication terms provided by domain experts but also ensures that the expanded search terms (ranging from several times to a dozen times more than those provided by the domain experts) are relevant to the user's input. Our study contributes to the advancement of ontology-driven techniques aimed at optimizing medication querying processes.
Speaker(s):
Xiaojin Li
UTHealth
Author(s):
Yan Huang - UT Health Science Center; Licong Cui, PhD - The University of Texas Health Science Center at Houston (UTHealth Houston) School of Biomedical Informatics; Shiqiang Tao, PhD - The University of Texas Health Science Center at Houston; GQ Zhang, PhD - The University of Texas Health Science Center at Houston;
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Information Extraction, Clinical Decision Support, Information Visualization, Usability
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Efficient medication querying in Electronic Health Record (EHR) datasets is crucial for effective patient care and clinical research. However, the complexity and volume of such datasets present significant challenges in extracting relevant medication information accurately. In this study, we propose an ontology-driven medication query optimization approach, named ODMQ, leveraging the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to enhance medication querying capabilities. By integrating semantic ontology structures of OMOP CDM, our method provides a simpler and more convenient way to obtain a comprehensive list of drug names, National Drug Codes, and generic names. This enhancement reduces the time required for clinical researchers to manually search for medication information and improves query capability. We validate the efficacy and scalability of our methodology by conducting evaluations and experiments on an extensive real-world COVID-19 EHR dataset. The experimental results demonstrate that ODMQ can effectively improve medication query outcomes. Through a comprehensive manual review of all expansion results, ODMQ not only covers the medication terms provided by domain experts but also ensures that the expanded search terms (ranging from several times to a dozen times more than those provided by the domain experts) are relevant to the user's input. Our study contributes to the advancement of ontology-driven techniques aimed at optimizing medication querying processes.
Speaker(s):
Xiaojin Li
UTHealth
Author(s):
Yan Huang - UT Health Science Center; Licong Cui, PhD - The University of Texas Health Science Center at Houston (UTHealth Houston) School of Biomedical Informatics; Shiqiang Tao, PhD - The University of Texas Health Science Center at Houston; GQ Zhang, PhD - The University of Texas Health Science Center at Houston;
A Novel Sentence Transformer-based Natural Language Processing Approach for Schema Mapping of Electronic Health Records to the OMOP Common Data Model
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Deep Learning, Large Language Models (LLMs), Natural Language Processing, Machine Learning
Working Group: Natural Language Processing Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Mapping electronic health records (EHR) data to common data models (CDMs) enables the standardization of clinical records, enhancing interoperability and enabling large-scale, multi-centered clinical investigations. Using 2 large publicly available datasets, we developed transformer-based natural language processing models to map medication-related concepts from the EHR at a large and diverse healthcare system to standard concepts in OMOP CDM. We validated the model outputs against standard concepts manually mapped by clinicians. Our best model reached out-of-box accuracies of 96.5% in mapping the 200 most common drugs and 83.0% in mapping 200 random drugs in the EHR. For these tasks, this model outperformed a state-of-the-art large language model (SFR-Embedding-Mistral, 89.5% and 66.5% in accuracy for the two tasks), a widely-used software for schema mapping (Usagi, 90.0% and 70.0% in accuracy), and direct string match (7.5% and 7.5% accuracy). Transformer-based deep learning models outperform existing approaches in the standardized mapping of EHR elements and can facilitate an end-to-end automated EHR transformation pipeline.
Speaker(s):
Xinyu Zhou
Yale University
Author(s):
Xinyu Zhou - Yale University; Lovedeep S Dhingra, MBBS - Yale University; Arya Aminorroaya, MD, MPH - Yale University; Philip Adejumo, BS - Yale University; Rohan Khera, MD, MS;
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Deep Learning, Large Language Models (LLMs), Natural Language Processing, Machine Learning
Working Group: Natural Language Processing Working Group
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Mapping electronic health records (EHR) data to common data models (CDMs) enables the standardization of clinical records, enhancing interoperability and enabling large-scale, multi-centered clinical investigations. Using 2 large publicly available datasets, we developed transformer-based natural language processing models to map medication-related concepts from the EHR at a large and diverse healthcare system to standard concepts in OMOP CDM. We validated the model outputs against standard concepts manually mapped by clinicians. Our best model reached out-of-box accuracies of 96.5% in mapping the 200 most common drugs and 83.0% in mapping 200 random drugs in the EHR. For these tasks, this model outperformed a state-of-the-art large language model (SFR-Embedding-Mistral, 89.5% and 66.5% in accuracy for the two tasks), a widely-used software for schema mapping (Usagi, 90.0% and 70.0% in accuracy), and direct string match (7.5% and 7.5% accuracy). Transformer-based deep learning models outperform existing approaches in the standardized mapping of EHR elements and can facilitate an end-to-end automated EHR transformation pipeline.
Speaker(s):
Xinyu Zhou
Yale University
Author(s):
Xinyu Zhou - Yale University; Lovedeep S Dhingra, MBBS - Yale University; Arya Aminorroaya, MD, MPH - Yale University; Philip Adejumo, BS - Yale University; Rohan Khera, MD, MS;
Evaluating the portability of automatic note classification methods with the LOINC Document Ontology
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Natural Language Processing, Reproducibility
Primary Track: Applications
To utilize clinical notes for research studies, it is necessary to identify the most relevant notes. Mapping to the LOINC Document Ontology makes this process easier by reducing the variability of note types. We used a BERT model to automatically identify LOINC DO entities in VA note titles. Future work will involve the use of additional note metadata and contents to improve note classification.
Speaker(s):
Annie Bowles, MS Biomedical Informatics
VHA Salt Lake City Health Care System
Author(s):
Patrick Alba, MS - United States Department of Veterans Affairs; Jianlin Shi, MS, MD - The Division of Epidemiology, School of Medicine, University of Utah; VA Salt Lake City Healthcare System; Qiwei Gan; Scott DuVall, PhD - VA Salt Lake City Health Care System; Elizabeth Hanchrow, RN, MSN - Veterans Affairs and WIVR;
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Natural Language Processing, Reproducibility
Primary Track: Applications
To utilize clinical notes for research studies, it is necessary to identify the most relevant notes. Mapping to the LOINC Document Ontology makes this process easier by reducing the variability of note types. We used a BERT model to automatically identify LOINC DO entities in VA note titles. Future work will involve the use of additional note metadata and contents to improve note classification.
Speaker(s):
Annie Bowles, MS Biomedical Informatics
VHA Salt Lake City Health Care System
Author(s):
Patrick Alba, MS - United States Department of Veterans Affairs; Jianlin Shi, MS, MD - The Division of Epidemiology, School of Medicine, University of Utah; VA Salt Lake City Healthcare System; Qiwei Gan; Scott DuVall, PhD - VA Salt Lake City Health Care System; Elizabeth Hanchrow, RN, MSN - Veterans Affairs and WIVR;
Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Data Standards, Large Language Models (LLMs), Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name–field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.01). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base.
Speaker(s):
Sowmya Somasundaram, Postdoc
Stanford
Author(s):
Sowmya Somasundaram, Postdoc - Stanford; Benjamin Solomon, Postdoc - Stanford University; Avani Khatri, M.S. - -; Anisha Laumas, M.S. - Stanford University; Purvesh Khatri - Stanford University; Mark Musen, MD, PhD - Stanford University;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Data Standards, Large Language Models (LLMs), Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name–field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.01). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base.
Speaker(s):
Sowmya Somasundaram, Postdoc
Stanford
Author(s):
Sowmya Somasundaram, Postdoc - Stanford; Benjamin Solomon, Postdoc - Stanford University; Avani Khatri, M.S. - -; Anisha Laumas, M.S. - Stanford University; Purvesh Khatri - Stanford University; Mark Musen, MD, PhD - Stanford University;
S114: Ontologies and Data Models - This IS-A Topic
Description
Date: Wednesday (11/13)
Time: 9:45 AM to 11:00 AM
Room: Continental Ballroom 1-2
Time: 9:45 AM to 11:00 AM
Room: Continental Ballroom 1-2