American Medical Informatics Association

A Machine-Assisted Framework for Ontology Development and Standardization: Case Study in Digital Health Technologies

Presentation Time: 03:15 PM - 03:30 PM

Abstract Keywords: Data Standards, Knowledge Representation and Information Modeling, Large Language Models (LLMs)
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics

Digital health technologies (DHTs) continue to reshape healthcare by enabling personalized care, improving patient outcomes, and accelerating clinical research. However, the surge in DHT-related literature creates new challenges in effectively organizing, retrieving, and applying the resulting knowledge. Ontologies, structured frameworks for categorizing and connecting concepts, are central to meeting these challenges. However, traditional ontology development in digital health often depends on manual processes, limiting efficiency, scalability, and cross-disciplinary adaptability. Building on previous work categorizing DHTs, we propose a new framework combining DHTs lexicon extraction, ontology enrichment, and human-in-the-loop validation. In this study, we illustrate how the concept of a “adaptive ontology,” powered by large language models (LLMs), can classify and enhance DHT ontologies systematically yet semi-automatically. Thus, providing a practical path to managing the evolving landscape of digital health.

Speaker:
Fang Chen, Master
University of Texas Health Science Center at Houston

Authors:
Taylor Harrison, M.B.A. - Mayo Clinic; Sunyang Fu, PhD, MHI - UTHealth; Ling He, MS - Kansas City University College of Osteopathic Medicine; Zhiyi Yue, MA - University of Texas Health Science Center at Houston; Shuyu Lu, Master; Liwei Wang, MD, PhD - UTHealth; Xiaoyang Ruan, PhD - The University of Texas Health Science Center at Houston; Hongfang Liu, PhD - University of Texas Health Science Center at Houston;

Facilitating Clinical Information Extraction with Synthetic Data and Ontology using Large Language Models

Presentation Time: 03:30 PM - 03:45 PM

Abstract Keywords: Natural Language Processing, Large Language Models (LLMs), Information Extraction, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Foundations

The rapid growth of unstructured clinical text in electronic health records necessitates robust information extraction systems, yet their development is hindered by the scarcity of high-quality annotated data. This study explores the potential of large language models to generate synthetic data for clinical named entity recognition and examines its impact on model performance. We propose a novel framework that integrates self-verified synthetic data generation with domain-specific semantic mapping using SNOMED-CT. By leveraging GPT-4o-mini for synthetic data creation and refining its quality through iterative verification and anomaly detection, we systematically evaluate the influence of synthetic data quality and quantity on fine-tuning LLaMA-3-8B. Experimental results across four datasets (MTSamples, UTP, MIMIC-III, and i2b2) demonstrate that self-verification and semantic mapping significantly enhance synthetic data utility, improving model generalizability. Our findings highlight the importance of balancing human-annotated and synthetic data, with a 1:1 ratio emerging as the optimal configuration for performance gains. This study advances clinical NLP by providing a scalable approach to mitigating annotation challenges while improving model performance.

Speaker:
Huan He, Ph.D.
Yale University

Authors:
Yan Hu, MS - UTHealth Science Center Houston; Huan He, Ph.D. - Yale University; Qingyu Chen, PhD - Yale University; Xiaoqian Jiang, PhD - University of Texas Health Science Center at Houston; Kirk Roberts, PhD - University of Texas Health Science Center at Houston; Hua Xu, Ph.D - Yale University;

Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety

Presentation Time: 03:45 PM - 04:00 PM

Abstract Keywords: Patient Safety, Controlled Terminologies, Ontologies, and Vocabularies, Bioinformatics, Data Mining, Knowledge Representation and Information Modeling, Informatics Implementation
Primary Track: Foundations
Programmatic Theme: Public Health Informatics

Semantic similarity measures (SSMs) are widely used in biomedical research but remain underutilized in
pharmacovigilance. This study evaluates six ontology-based SSMs for clustering MedDRA Preferred Terms (PTs)
in drug safety data. Using the Unified Medical Language System (UMLS), we assess each method’s ability to group
PTs around medically meaningful centroids. A high-throughput framework was developed with a Java API and
Python/R interfaces support large-scale similarity computations. Results show that while path-based methods
perform moderately with F1 scores of 0.36 for WUPALMER and 0.28 for LCH, intrinsic information content (IC)-
based measures, especially INTRINSIC_LIN and SOKAL, consistently yield better clustering accuracy (F1 Score of
0.403). Validated against expert review and standard MedDRA queries (SMQs), our findings highlight the promise
of IC-based SSMs in enhancing pharmacovigilance workflows by improving early signal detection and reducing
manual review.

Speaker:
Jeffery Painter, MS, JD
GSK

Authors:
François Haguinet, MS - GlaxoSmithKline; Gregory Powell, PharmD, MBA - GlaxoSmithKline; Andrew Bate, PhD - GlaxoSmithKline;

From Food to Clinic: Mapping FoodOn to the UMLS to Enable Nutritional Decision Support

Presentation Time: 04:00 PM - 04:15 PM

Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Knowledge Representation and Information Modeling, Data Modernization, Precision Medicine, Clinical Decision Support
Primary Track: Foundations
Programmatic Theme: Clinical Informatics

Food and nutrition knowledge is recognized as a fundamental factor for the health and well-being of communities; however, its integration into biomedical and health knowledge systems is limited by the absence of standardized ontologies that encapsulate food-related concepts. This study mapped FoodOn, an open-source food ontology, to the Unified Medical Language System (UMLS) metathesaurus, a compendium of biomedical ontologies. As the first systematic mapping of a food ontology to the UMLS, the results of this study provide an ontological foundation for incorporating dietary data into clinical and public health workflows. The findings suggest that expanding the representation of food concepts in biomedical ontologies could enhance the potential to incorporate food and nutritional into clinical decision-making and research. Furthermore, this work lays the groundwork for integrating food-based therapies from traditional medicine systems (e.g., Ayurveda and Traditional Chinese Medicine) into contemporary clinical knowledge frameworks to support more holistic approaches to health care.

Speaker:
Neil Sarkar, PhD, MLIS
Rhode Island Quality Institute & Brown University

Author:
Neil Sarkar, PhD, MLIS - Rhode Island Quality Institute & Brown University;

Knowledge Engineering for Medical Vocabularies Using Large Language Models

Presentation Time: 04:15 PM - 04:30 PM

Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Standards
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics

Medical vocabularies are essential tools for capturing, classifying, and analyzing healthcare data. However, the creation and maintenance of these vocabularies are often labor-intensive and costly. This preliminary study evaluates the feasibility of using large language models (LLMs) to automate three key tasks in medical vocabulary management: term similarity, subsumption, and grouping. Using 1,533 cardiovascular terms from SNOMED CT, we applied GPT-4o and assessed the performance of 3 elementary tasks against OHDSI standardized vocabularies. While LLMs demonstrated high precision across tasks (0.78 for term similarity, 0.74 for term subsumption, 0.78 for term grouping), recall was notably lower (0.41 for term similarity, 0.08 for term subsumption, 0.52 for term grouping), indicating gaps in coverage. Overall, LLMs show promise for medical vocabulary tasks but require further refinement for clinical specificity and completeness. Future work should focus on enhancing recall, reducing hallucinations, and evaluating scalability across broader terminology sets.

Speaker:
Hsin Yi Chen, B.S.
Columbia University

Authors:
Hsin Yi Chen, B.S. - Columbia University; Anna Ostropolets, PhD - Johnson and Johnson; Chunhua Weng, PhD - Columbia University; George Hripcsak, MD - Columbia University Irving Medical Center;

Custom CSS

S05: Ontologies and Knowledge Engineering in the Age of LLMs

S05: Ontologies and Knowledge Engineering in the Age of LLMs

Description

Custom CSS