Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change
3/10/2025 |
3:30 PM – 5:00 PM |
Frick
S03: Ontologies, Standards, and Standardization
Presentation Type: Podium Abstract
Session Credits: 1.5
Session Chair:
Anthony Solomonides, PhD, MSc(Math), FAMIA, FACMI - Research Institute, Endeavor Health
Identifying Dietary Supplements Related Effects from Social Media by ChatGPT
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Data Mining and Knowledge Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
This study advances relationship identification in social media by analyzing dietary supplement-related tweets aiming to expand the drug-supplement interactions dataset iDisk. We collected 90,000+ tweets (2007-2022) and annotated 1,000 for nuanced relationships and entities. Using a BioBERT model and ChatGPT-generated prompts, we conducted entity type and relationship identification. The BioBERT model achieved an F1 score of 0.90 for relationship prediction, while ChatGPT prompts reached 0.99. Entity type recognition proved more challenging, with high semantic similarity between types impacting accuracy. Our methodology significantly enhances relationship identification from social media data, particularly for dietary supplements usage, offering promising methods for improved post-market surveillance and public health monitoring. This work demonstrates the potential of combining traditional NLP models with large language models for complex text analysis tasks in healthcare.
Speaker(s):
Ying Liu, Ph. D
University of Minnesota
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Natural Language Processing, Data Mining and Knowledge Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
This study advances relationship identification in social media by analyzing dietary supplement-related tweets aiming to expand the drug-supplement interactions dataset iDisk. We collected 90,000+ tweets (2007-2022) and annotated 1,000 for nuanced relationships and entities. Using a BioBERT model and ChatGPT-generated prompts, we conducted entity type and relationship identification. The BioBERT model achieved an F1 score of 0.90 for relationship prediction, while ChatGPT prompts reached 0.99. Entity type recognition proved more challenging, with high semantic similarity between types impacting accuracy. Our methodology significantly enhances relationship identification from social media data, particularly for dietary supplements usage, offering promising methods for improved post-market surveillance and public health monitoring. This work demonstrates the potential of combining traditional NLP models with large language models for complex text analysis tasks in healthcare.
Speaker(s):
Ying Liu, Ph. D
University of Minnesota
From Complex to Comprehensible: A TF-IDF Approach for Hierarchical Aggregation of Clinical Conditions in SNOMED CT
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Ontologies, Cohort Discovery, EHR-based Phenotyping, Reproducible Research Methods and Tools
Primary Track: Clinical Research Informatics
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
Structured data in electronic health records (EHRs) rely on controlled vocabularies for consistent representation, such as SNOMED CT, a comprehensive hierarchical ontology of clinical concepts. The granularity of SNOMED, while beneficial for detailed clinical documentation, can be problematic, and the lack of a gold-standard method for concept roll-up – the process of aggregating fine-grained concepts to more general levels – has limited the full potential of these granular concepts in data-driven clinical research. To address this, we propose an approach using Term Frequency-Inverse Document Frequency (TF-IDF), offering a flexible and clinically relevant method for concept aggregation in clinical informatics.
The TF-IDF measure at the center of this algorithm quantifies the importance of each concept relative to its occurrence in the broader SNOMED hierarchy. We introduce a granularity parameter to control how specific or generalized the roll-up should be. We evaluated the utility of this granularity-based approach to concept aggregation across two distinct domains: Automating concept set creation and Subphenotyping post-acute sequelae of SARS-CoV-2 infection (PASC).
In the concept set generation evaluation, the TF-IDF-based roll-up approach performed well for well-defined conditions like "Uncomplicated Diabetes", capturing approximately 80% of all 127 concepts within the concept set. However, for less clearly defined conditions like "Chronic Pain" with 245 concepts, it captured only 35% of the total concept set. In the subphenotyping evaluation, the raw SNOMED CT approach generated 5 clusters comprising 90 unique clinical conditions. The TF-IDF-based roll-up method simplified this into 3 clusters containing 42 rolled-up conditions, improving cluster separability and interpretability.
Speaker(s):
Abhishek Bhatia, MS
University of North Carolina at Chapel Hill
Author(s):
Emily Pfaff, PhD, MS - UNC Chapel Hill School of Medicine; Tomas McIntee, PhD - University of North Carolina at Chapel Hill; Miles Crosskey, PhD - CoVar Applied Technologies; Eesha Pisal, MPS - University of North Carolina at Chapel Hill;
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Ontologies, Cohort Discovery, EHR-based Phenotyping, Reproducible Research Methods and Tools
Primary Track: Clinical Research Informatics
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
Structured data in electronic health records (EHRs) rely on controlled vocabularies for consistent representation, such as SNOMED CT, a comprehensive hierarchical ontology of clinical concepts. The granularity of SNOMED, while beneficial for detailed clinical documentation, can be problematic, and the lack of a gold-standard method for concept roll-up – the process of aggregating fine-grained concepts to more general levels – has limited the full potential of these granular concepts in data-driven clinical research. To address this, we propose an approach using Term Frequency-Inverse Document Frequency (TF-IDF), offering a flexible and clinically relevant method for concept aggregation in clinical informatics.
The TF-IDF measure at the center of this algorithm quantifies the importance of each concept relative to its occurrence in the broader SNOMED hierarchy. We introduce a granularity parameter to control how specific or generalized the roll-up should be. We evaluated the utility of this granularity-based approach to concept aggregation across two distinct domains: Automating concept set creation and Subphenotyping post-acute sequelae of SARS-CoV-2 infection (PASC).
In the concept set generation evaluation, the TF-IDF-based roll-up approach performed well for well-defined conditions like "Uncomplicated Diabetes", capturing approximately 80% of all 127 concepts within the concept set. However, for less clearly defined conditions like "Chronic Pain" with 245 concepts, it captured only 35% of the total concept set. In the subphenotyping evaluation, the raw SNOMED CT approach generated 5 clusters comprising 90 unique clinical conditions. The TF-IDF-based roll-up method simplified this into 3 clusters containing 42 rolled-up conditions, improving cluster separability and interpretability.
Speaker(s):
Abhishek Bhatia, MS
University of North Carolina at Chapel Hill
Author(s):
Emily Pfaff, PhD, MS - UNC Chapel Hill School of Medicine; Tomas McIntee, PhD - University of North Carolina at Chapel Hill; Miles Crosskey, PhD - CoVar Applied Technologies; Eesha Pisal, MPS - University of North Carolina at Chapel Hill;
Developing an Ontology for Pressure Injury Management Domain Knowledge
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Knowledge Representation, Management, or Engineering, Clinical Decision Support for Translational/Data Science Interventions, Ontologies
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Pressure injuries, also known as pressure ulcers or bedsores, are common in healthcare settings, particularly among patients with limited mobility. Early detection and personalized treatment strategies are essential for improving patient outcomes, optimizing care efficiency, and enhancing healthcare providers' competencies. Ontologies provide explicit definitions of domain-specific terms and serve as structured knowledge models for domain concepts, relationships, properties, and instances. This paper introduces the development of the Pressure Injury Management Ontology (PIMO), aimed at promoting interoperability across diverse data sources, improving data mining capabilities, and streamlining clinical data analysis to predict better and treat pressure injuries.
Speaker(s):
Suzan Ahmad, PhD
Rutgers University
Author(s):
Adam Bouras, PhD - CDC/OGHA;
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Knowledge Representation, Management, or Engineering, Clinical Decision Support for Translational/Data Science Interventions, Ontologies
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Pressure injuries, also known as pressure ulcers or bedsores, are common in healthcare settings, particularly among patients with limited mobility. Early detection and personalized treatment strategies are essential for improving patient outcomes, optimizing care efficiency, and enhancing healthcare providers' competencies. Ontologies provide explicit definitions of domain-specific terms and serve as structured knowledge models for domain concepts, relationships, properties, and instances. This paper introduces the development of the Pressure Injury Management Ontology (PIMO), aimed at promoting interoperability across diverse data sources, improving data mining capabilities, and streamlining clinical data analysis to predict better and treat pressure injuries.
Speaker(s):
Suzan Ahmad, PhD
Rutgers University
Author(s):
Adam Bouras, PhD - CDC/OGHA;
Developing an Annotation Corpus from Case Reports to Advance Knowledge Discovery of Rare Diseases
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Natural Language Processing, Data Mining and Knowledge Discovery, Data Sharing/Interoperability
Primary Track: Clinical Research Informatics
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
Case reports are valuable in advancing medical scientific knowledge, especially of rare diseases. The study presents the development of an annotated corpus for idiopathic pulmonary fibrosis. The corpus includes 15 IPF case reports, with 1,353 annotated sentences and 4,934 entities labeled for concepts including medical problems, treatments, tests, exposure, genetics, and social determinants of health. The overall inter-annotator agreement is 0.858. We target annotating a total of 121 IPF case reports and also annotating relations between the entities. The annotation guideline is generalizable to other rare diseases, and will be publicly available together with the resulting annotation corpus through the OHNLP GitHub.
Speaker(s):
Liwei Wang, PhD
UTHealth
Author(s):
Taylor Harrison, M.B.A. - Mayo Clinic; Heling Jia, M.D. - Mayo Clinic; Qiuhao Lu, Ph.D. - University of Texas Health Science Center at Houston; jinlian wang, PhD - UTHealth; Rui Li, Phd - UT health; Andrew Wen, MS - University of Texas Health Sciences Center at Houston; Jennifer St. Sauver, MPH, PhD - Mayo Clinic; Wei-Qi Wei, MD, PhD - Vanderbilt University Medical Center; Jungwei Fan, Ph.D. - Mayo Clinic; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Natural Language Processing, Data Mining and Knowledge Discovery, Data Sharing/Interoperability
Primary Track: Clinical Research Informatics
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
Case reports are valuable in advancing medical scientific knowledge, especially of rare diseases. The study presents the development of an annotated corpus for idiopathic pulmonary fibrosis. The corpus includes 15 IPF case reports, with 1,353 annotated sentences and 4,934 entities labeled for concepts including medical problems, treatments, tests, exposure, genetics, and social determinants of health. The overall inter-annotator agreement is 0.858. We target annotating a total of 121 IPF case reports and also annotating relations between the entities. The annotation guideline is generalizable to other rare diseases, and will be publicly available together with the resulting annotation corpus through the OHNLP GitHub.
Speaker(s):
Liwei Wang, PhD
UTHealth
Author(s):
Taylor Harrison, M.B.A. - Mayo Clinic; Heling Jia, M.D. - Mayo Clinic; Qiuhao Lu, Ph.D. - University of Texas Health Science Center at Houston; jinlian wang, PhD - UTHealth; Rui Li, Phd - UT health; Andrew Wen, MS - University of Texas Health Sciences Center at Houston; Jennifer St. Sauver, MPH, PhD - Mayo Clinic; Wei-Qi Wei, MD, PhD - Vanderbilt University Medical Center; Jungwei Fan, Ph.D. - Mayo Clinic; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
An Implemented Real-World-Data Pipeline for Standardization of Electronic Health Records in Precision Oncology
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Clinical and Research Data Collection, Curation, Preservation, or Sharing, Data Standards, Natural Language Processing, Data Mining and Knowledge Discovery
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Several use cases in precision oncology require accurately extracting and standardizing Real-World Data from Electronic Health Records (EHRs). We developed the infrastructure and a toolset incorporating data mining and natural language processing scripts to automatically retrieve selected descriptive and common endpoint variables from EHRs. This toolset was evaluated against a reference dataset of 106 lung cancer and 45 sarcoma patient cases pulled from two databases complying with the Precision Oncology Core Data Model (Precision-DM) and maintained by the Johns Hopkins Molecular Tumor Board and a research team. We accurately retrieved most descriptive EHR fields but less efficiently extracted the Date of Diagnosis and Treatment Start Date that supported calculating the Age at Diagnosis, Overall Survival, and Time to First Treatment (accuracy range 50%-86%). Our infrastructure and Precision-DM-based standardization could inspire similar efforts in other cancer centers, however, the toolset should be enhanced to improve accuracy in certain variables.
Speaker(s):
Taxiarchis Botsis, MSc, MPS, PhD
Johns Hopkins University School of Medicine
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Clinical and Research Data Collection, Curation, Preservation, or Sharing, Data Standards, Natural Language Processing, Data Mining and Knowledge Discovery
Primary Track: Clinical Research Informatics
Programmatic Theme: Emerging Best Practices for Clinical Research Informatics Operations
Several use cases in precision oncology require accurately extracting and standardizing Real-World Data from Electronic Health Records (EHRs). We developed the infrastructure and a toolset incorporating data mining and natural language processing scripts to automatically retrieve selected descriptive and common endpoint variables from EHRs. This toolset was evaluated against a reference dataset of 106 lung cancer and 45 sarcoma patient cases pulled from two databases complying with the Precision Oncology Core Data Model (Precision-DM) and maintained by the Johns Hopkins Molecular Tumor Board and a research team. We accurately retrieved most descriptive EHR fields but less efficiently extracted the Date of Diagnosis and Treatment Start Date that supported calculating the Age at Diagnosis, Overall Survival, and Time to First Treatment (accuracy range 50%-86%). Our infrastructure and Precision-DM-based standardization could inspire similar efforts in other cancer centers, however, the toolset should be enhanced to improve accuracy in certain variables.
Speaker(s):
Taxiarchis Botsis, MSc, MPS, PhD
Johns Hopkins University School of Medicine
A Preliminary Ontological Model for Assessment Instruments
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Ontologies, Clinical Decision Support for Translational/Data Science Interventions, Data Integration
Primary Track: Clinical Research Informatics
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
The use of standardized assessments is ubiquitous in healthcare. Despite their ubiquity, there are significant gaps in both the representation and content coverage of assessment instruments in mainstream clinical terminology. Focusing on assessment instruments used in mental health, we propose a preliminary ontological model for the unambiguous representation of such instruments and describe the development of the model, propose a set of preliminary defining attributes, and provide exemplars of the implemented model.
Speaker(s):
Piper Ranallo, PhD
University of Minnesota
Author(s):
Genevieve Melton-Meaux, MD, PhD - University of Minnesota; Rui Zhang, PhD, FAMIA - University of Minnesota, Twin Cities; James Cimino, MD, FACMI, FACP, FAMIA - Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama at Birmingham; Robert Krueger, PhD - University of Minnesota;
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Ontologies, Clinical Decision Support for Translational/Data Science Interventions, Data Integration
Primary Track: Clinical Research Informatics
Programmatic Theme: Real-World Evidence in Informatics: Bridging the Gap between Research and Practice
The use of standardized assessments is ubiquitous in healthcare. Despite their ubiquity, there are significant gaps in both the representation and content coverage of assessment instruments in mainstream clinical terminology. Focusing on assessment instruments used in mental health, we propose a preliminary ontological model for the unambiguous representation of such instruments and describe the development of the model, propose a set of preliminary defining attributes, and provide exemplars of the implemented model.
Speaker(s):
Piper Ranallo, PhD
University of Minnesota
Author(s):
Genevieve Melton-Meaux, MD, PhD - University of Minnesota; Rui Zhang, PhD, FAMIA - University of Minnesota, Twin Cities; James Cimino, MD, FACMI, FACP, FAMIA - Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama at Birmingham; Robert Krueger, PhD - University of Minnesota;