[Skip to Content]
Join AMIA
Menu
  • Register
  • Program Schedule
  • Speaker Search
  • My Account
  • Home
  • 2026 Annual Symposium Gallery
  • Evaluation of RAG-Based Approach for SNOMED CT Concept Mapping of Multi-Institutional Local Clinical Terminology

Custom CSS

double-click to edit, do not edit in source


S116: Lone Star Logic: Ontologies That Hold the Line (Oral Presentations)


11/11/2026 | 9:45 AM – 11:00 AM | Room 11
Presentation Type: Oral Presentations

OmiKG: An Ontology and Knowledge Graph for Mechanistic Root Cause Analysis in Functional Medicine

Presentation Type: Paper - Regular
Presentation Time: 09:45 AM - 09:57 AM

Abstract Keywords: Knowledge Representation and Information Modeling, Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Information Extraction, Causal Inference, Natural Language Processing, Clinical Decision Support, Evaluation
Programmatic Theme: Clinical Research Informatics

Chronic disease management requires a transition from symptom-based treatment to systems medicine; however, current biomedical ontologies lack the causal architecture needed for mechanistic etiological inference. This study introduces OmiKG, a domain-specific ontology and knowledge graph engineered for Functional Medicine, modeling multi-factorial causal trajectories across four layers with OWL-formalized causal constraints. A Competency Question-driven, synthesis-first Large Language Model pipeline extracted causal triplets from PubMed with provenance. Evaluation across 20 competency questions demonstrated that system choice significantly differentiates response quality, with OmiKG achieving the highest Mechanistic Depth (3.49 ± 0.54), exceeding RAG (2.90 ± 0.50) and scoring above GPT-5.2 (3.17 ± 0.45). OmiKG identified 88% of assessed mechanisms versus 6% for GPT-5.2 and 0% for RAG. No system dominated all dimensions; clinical utility remained lowest (2.67–2.98). Citation-level fidelity requires improvement before deployment. Nevertheless, OmiKG provides a transparent, traceable causal knowledge structure representing mechanistic pathways that general-purpose language models do not surface.

Speaker(s):
Nhung Nguyen, Master
OmiGroup

Author(s):
Nhung Nguyen, Master - OmiGroup; Ngoc Khuc, Bachelor of Science - OmiNext JSC; Long Phi, Bachelor of Science and Technology - OmiNext JSC; Thuy Tran, Bachelor of Science - OmiNext JSC, Hanoi, Vietnam; Tuong Tran, Bachelor of Science - OmiNext JSC, Hanoi, Vietnam; Huong Tran, Bachelor of Science - OmiNext JSC, Hanoi, Vietnam; Huan Khuc, Master Degree - Quang Ninh General Hospital, Quang Ninh, Vietnam; Ngan Hoang, PhD - National Institute of Nutrition, Hanoi, Vietnam; School of Medicine and Dentistry, Griffith University, Gold Coast, QLD, Australia; Dung Tran, Master of Science - OmiNext JSC, Hanoi, Vietnam;
Nhung Nguyen, Master - OmiGroup
Vaxafe: An Ontology-Driven Semantic Integration Platform for Precision Vaccine Safety Surveillance

Presentation Type: Paper - Regular
Presentation Time: 09:57 AM - 10:09 AM

Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Knowledge Representation and Information Modeling, Data transformation/ETL, Large Language Models (LLMs), Natural Language Processing, Data Mining, Public Health, Patient Safety
Programmatic Theme: Public Health Informatics

Post-marketing vaccine safety monitoring relies on spontaneous reporting systems such as VAERS, where semantic heterogeneity and unstructured clinical text can obscure safety signals. To address this challenge, we developed Vaxafe, a web-accessible platform that maps VAERS vaccine adverse event (VAE) reports to the Vaccine Ontology (VO) and the Ontology of Adverse Events (OAE). Using a multi-tiered pipeline incorporating LLM-assisted “safety nets” and fuzzy matching, Vaxafe processed 2.28 million records. The system recovered 30.9% of ambiguous reports that would otherwise be lost and achieved 71.32% semantic coverage across symptom occurrences. Three web modules including individual VAE case query, conditional VAE cohort query, and statistical VAE analysis enables interactive exploration. Application of Vaxafe revealed formulation-dependent safety patterns, identifying Guillain-Barré Syndrome (GBS) signal in Inactivated and some Live-Attenuated influenza vaccines, while recombinant protein formulations showed minimal signal. Overall, Vaxafe provides a rigorous ontology-driven environment for precision vaccine safety monitoring.

Speaker(s):
Feng-Yu Yeh, Masters Degree
University of Michigan Medical School - He Lab

Author(s):
Feng-Yu Yeh, Masters Degree - University of Michigan Medical School - He Lab; Jie Zheng, PhD - University of Michigan; Yongqun He, PhD - University of Michigan;
Feng-Yu Yeh, Masters Degree - University of Michigan Medical School - He Lab
Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

Presentation Type: Paper - Regular
Presentation Time: 10:09 AM - 10:21 AM

Abstract Keywords: Artificial Intelligence, Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Interoperability and Health Information Exchange, Data Modernization, Quantitative Methods
Programmatic Theme: Academic Informatics / LIEAF

Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. When reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model’s training knowledge alone. We present an LLM-based metadata standardization system that queries authoritative biomedical terminology services in real time to retrieve canonically correct vocabulary terms on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical, scalable approach to automated standardization of biomedical metadata.

Speaker(s):
Josef Hardi, MSc
Stanford University

Author(s):
Josef Hardi, MSc - Stanford University; Martin O'Connor, MSc - Stanford University; Marcos Martínez-Romero, PhD - Stanford University; Jean Rosario, PhD - University of Pennsylvania; Stephen Fisher, PhD - University of Pennsylvania; Mark Musen, MD, PhD - Stanford University;
Josef Hardi, MSc - Stanford University
Evaluation of RAG-Based Approach for SNOMED CT Concept Mapping of Multi-Institutional Local Clinical Terminology

Presentation Type: Paper - Regular
Presentation Time: 10:21 AM - 10:33 AM

Abstract Keywords: Data Standards, Interoperability and Health Information Exchange, Large Language Models (LLMs)
Programmatic Theme: Clinical Research Informatics

Heterogeneous local clinical terminologies in multi-institutional electronic health records hinder clinical data integration and effective use of medical AI, creating a need for automated standard terminology mapping. This study developed and evaluated a RAG-based pipeline to map multi-institutional local clinical terminologies to SNOMED CT. Clinical terminology data (n = 902,488) were extracted from diagnosis, chief complaint, surgery, and procedure fields of electronic health records from nine healthcare institutions. Through terminology normalization and expert validation, a ground truth dataset of 3,000 SNOMED CT concepts was constructed. Using this dataset, we evaluated a pipeline composed of sparse retrieval, dense retrieval, ensemble retrieval, and reranking. Results showed Top-10 accuracy reached 0.78, indicating retrieval-based approaches support candidate generation for automated terminology mapping. However, Top-1 accuracy remained at 0.52, suggesting fully automated mapping may be unreliable. These findings indicate pipeline is most suitable for a human-in-the-loop workflow where candidate concepts are reviewed by clinical experts.

Speaker(s):
Youngeun kim, MSN
Kangwon National University

Author(s):
Chansik Kim, Researcher/Ph.D Candidate - The Catholic University Of Korea; Sangho Lee, MD, PhD - Kyung Hee University Hospital at Gangdong; Mijeong Park, MSN - Kangwon National University (Wonju); Minseong Kim, BSN - Kangwon National University (Wonju); Jiho Kim, MS - The Catholic University of Korea; Minjee Kim, MS - Kyung Hee University Hospital at Gangdong; Nayoung Chi, BSN - Kyung Hee University Hospital at Gangdong; Doyon Kim, BSN - Seoul National University Hospital; Jisan Lee, Ph.D. - Kangwon National University (Wonju); Taehoon Ko, Ph.D. - The Catholic University of Korea College of Medicine;
Youngeun kim, MSN - Kangwon National University
CTG-DB: An Ontology-Based Transformation of ClinicalTrials.gov to Enable Cross-Trial Drug Safety Analyses

Presentation Type: Paper - Regular
Presentation Time: 10:33 AM - 10:45 AM

Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Knowledge Representation and Information Modeling, Data transformation/ETL, Data Standards, Data Mining, Real-World Evidence Generation
Programmatic Theme: Clinical Research Informatics

ClinicalTrials.gov (CT.gov) is the largest publicly accessible registry of clinical studies, yet its registry-oriented architecture and heterogeneous adverse event (AE) terminology limit systematic pharmacovigilance (PV) analytics. AEs are typically recorded as investigator-reported text rather than standardized identifiers, requiring manual reconciliation to identify coherent safety concepts. We present the ClinicalTrials.gov Transformation Database (CTG-DB), an open-source pipeline that ingests the complete CT.gov XML archive and produces a relational database aligned to standardized AE terminology using the Medical Dictionary for Regulatory Activities (MedDRA). CTG-DB preserves arm-level denominators, represents placebo and comparator arms, and normalizes AE terminology using deterministic exact and fuzzy matching to ensure transparent and reproducible mappings. This framework enables concept-level retrieval and cross-trial aggregation for scalable placebo-referenced safety analyses and integration of clinical trial evidence into downstream PV signal detection.

Speaker(s):
Jeffery Painter, MS, JD
GSK

Author(s):
François Haguinet, MS - GSK; Andrew Bate, PhD - GSK;
Jeffery Painter, MS, JD - GSK
Ensemble Logic for Symbolic Representation of Sleep Medicine Guidelines

Presentation Type: Paper - Student
Presentation Time: 10:45 AM - 10:57 AM

Abstract Keywords: Knowledge Representation and Information Modeling, Clinical Decision Support, Workflow
Programmatic Theme: Clinical Research Informatics

The AASM Manual is the clinical standard for polysomnography (PSG) scoring, but its narrative rules can admit multiple reasonable interpretations, contributing to inter-scorer variability and implementation differences across studies and software systems. We present a formal framework for translating sleep-scoring rules into Rational Ensemble Logic (QEL), a dense-time formalism that combines first-order quantification with metric temporal operators. Using an extraction-and-compilation procedure, we identified 18 unique atomic propositions and derived 12 final specifications corresponding to clinically scoreable AASM events. Back-translation of QEL specifications into clinician-facing language retained high semantic fidelity to the original scoring narratives (embedding cosine similarity: 79.3 (95% CI: 79.0--79.7) despite low lexical overlap (ROUGE-L: 18.3 (95% CI: 17.6--18.9)). Formalization also clarifies latent ambiguities, including implicit physiological latencies and overlapping exclusions. This framework yields executable, rigorous rule specifications for computational phenotyping, more consistent implementation across datasets, and standardized open-source PSG analysis.

Speaker(s):
Jiahao Fan, PhD
University of Texas Health Science Center at Houston

Author(s):
Jiahao Fan, PhD - University of Texas Health Science Center at Houston; Xiaojin Li, Ph.D. - University of Texas Health Science Center at Houston; Yan Huang, Ph.D - UT Health Science Center; Xubing Hao, Doctorate - The University of Texas Health Science Center at Houston; Licong Cui, PhD - The University of Texas Health Science Center at Houston (UTHealth Houston); GQ Zhang, PhD - The University of Texas Health Science Center at Houston;
Jiahao Fan, PhD - University of Texas Health Science Center at Houston

Evaluation of RAG-Based Approach for SNOMED CT Concept Mapping of Multi-Institutional Local Clinical Terminology

Category

Paper - Regular

Description

Custom CSS

double-click to edit, do not edit in source

Date: Wednesday (11/11)
Time: 9:45 AM to 11:00 AM
Room: Room 11

Back to Speaker Gallery
11/11/2026 11:00 AM (Central Time (US & Canada))


Amia logo

Headquarters:
6218 Georgia Avenue NW, Suite #1
PMB 3077
Washington, DC 20011
Phone: 301.657.1291

© 2026 American Medical Informatics Association. All Rights Reserved.