American Medical Informatics Association - Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs

Presentation Time: 09:45 AM - 10:00 AM

Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Information Retrieval, Data Standards, Artificial Intelligence
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics

Ensuring the completeness of IS-A relations in SNOMED CT is crucial for maintaining its accuracy in clinical applications. In this study, we propose a hybrid approach leveraging non-lattice subgraphs and pre-trained language models (PLMs) to identify missing IS-A relations in SNOMED CT. We fine-tuned four BERT-based models: BERT, DistillBERT, DeBERTa, and BioClinicalBERT, and four generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Missing IS-A relations were identified through consensus predictions by all eight models. DeBERTa achieved the best performance (precision: 0.96, recall: 0.97, F1-score: 0.965) for IS-A relation prediction. Our approach identified 678 potential missing IS-A relations in SNOMED CT (March 2023 US Edition), of which 100 randomly selected cases were manually reviewed by a domain expert, confirming 93 as valid (93% precision). These results demonstrate the effectiveness of fine-tuned PLMs in detecting missing IS-A relations within non-lattice subgraphs, offering a promising avenue for improving SNOMED CT's quality.

Speaker:
Xubing Hao, Bachelor's degree
The University of Texas Health Science Center at Houston

Authors:
Xubing Hao, Bachelor's degree - The University of Texas Health Science Center at Houston; Rashmie Abeysinghe, PhD - The University of Texas Health Science Center at Houston; Jay Shi, MD - Intermountain Health; GQ Zhang, PhD - The University of Texas Health Science Center at Houston; Licong Cui, PhD - The University of Texas Health Science Center at Houston (UTHealth Houston);

Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

Presentation Time: 10:00 AM - 10:15 AM

Abstract Keywords: Information Extraction, Large Language Models (LLMs), Social Media and Connected Health, Real-World Evidence Generation, Drug Discoveries, Repurposing, and Side-effect
Primary Track: Applications
Programmatic Theme: Public Health Informatics

Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide’s side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.

Speaker:
Zhijie Duan, B.S.
University of Pennsylvania

Authors:
Zhijie Duan, B.S. - University of Pennsylvania; Kai Wei, Master - University of Michigan; Zhaoqian Xue, BSc - Georgetown University; Jiayan Zhou, PhD - Stanford University School of Medicine; Shu Yang, PhD - University of Pennsylvania; Siyuan Ma, PhD - Vanderbilt University; Jin Jin, PhD - University of Pennsylvania; Lingyao Li, PhD - University of South Florida;

Knowledge-Graph-Enhanced Graph Neural Network for Early Prediction of Rapid Progression Subtype in Parkinson’s Disease

Presentation Time: 10:15 AM - 10:30 AM

Abstract Keywords: Machine Learning, Knowledge Representation and Information Modeling, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses
Primary Track: Applications

We propose a knowledge-graph-enhanced graph neural network (GNN) framework for early identification of fast-progressing Parkinson’s disease patients using multimodal data. By incorporating curated biomedical knowledge from iBKH, our model prioritizes mechanistically relevant features and captures higher-order interactions. It outperforms baseline models, achieving superior accuracy and generalizability. This approach supports early subtype classification and highlights the promise of knowledge-driven precision medicine in neurodegenerative diseases.

Speaker:
Zuoyu Yan, Ph.D.
Weill Cornell Medicine

Authors:
Zuoyu Yan, Ph.D. - Weill Cornell Medicine; Haoyang Li, PhD - Weill Cornell Medicine; Chang Su, PhD - Weill Cornell Medicine; Fei Wang, PhD - Weill Cornell Medicine;

DGSurv: Dynamic Graph-Based Multimodal Learning for Interpretable Cancer Survival Prediction

Presentation Time: 10:30 AM - 10:45 AM

Abstract Keywords: Machine Learning, Deep Learning, Artificial Intelligence, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses
Primary Track: Applications

Multimodal learning in cancer research offers transformative potential for enhancing medical care and guiding clinical decisions. Most analyses rely on unimodal inputs or employ simplistic multimodal fusion techniques, which do not optimally integrate the diverse data types. Additionally, there is a critical need for enhanced interpretative methods to fully exploit the depth of multimodal patient data. To address these issues, we propose DGSurv, a novel multimodal learning approach that utilizes a graph neural network (GNN) to dynamically map inter-modality relationships for cancer survival prediction. We demonstrate the utility of our proposed approach on cancer survival prediction, highlighting its potential to inform more accurate clinical decision-making.
We perform empirical evaluations on four cancer datasets from The Cancer Genome Atlas Program (TCGA) and demonstrate that DGSurv outperforms existing fusion techniques.
For interpretability, our study advances multimodal cancer analysis by effectively harnessing the full spectrum of multimodal data and significantly boosting its interpretability.

Speaker:
Sajjad Shahabi, Phd Student
University of Southern California

Authors:
Sajjad Shahabi, Phd Student - University of Southern California; Zijun Cui, Ph.D. - Michigan State University; Ruishan Liu, Ph.D. - University of Southern California; Joseph Carlson, M.D., Ph.D. - City of Hope; Yan Liu, Ph.D. - University of Southern California;

Adaptive Constraint Relaxation in Personalized Nutrition Recommendations: An LLM-Driven Knowledge Graph Retrieval Approach

Presentation Time: 10:45 AM - 11:00 AM

Abstract Keywords: Information Retrieval, Personal Health Informatics, Large Language Models (LLMs), Data Mining, Knowledge Representation and Information Modeling, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics

Personalized food recommendation systems must balance various constraints, including medical guidelines, nutritional needs, and individual preferences. However, existing methods often struggle with overly restrictive queries, frequently failing to generate recommendations when no exact match exists. To address this challenge, we propose an adaptive knowledge graph (KG) retrieval framework that integrates Large Language Models (LLMs) for intelligent constraint relaxation. Our approach dynamically prioritizes constraints, ensuring that critical dietary requirements remain intact while selectively relaxing less essential ones. By leveraging LLM-driven constraint analysis and structured relaxation strategies, our system significantly enhances recommendation coverage without compromising key dietary needs, while maintaining best recommendation performance. Experimental results on original and the extended-constraint dataset demonstrate that our method successfully retrieves recommendations in cases where previous approaches fail, achieving higher retrieval accuracy and a balanced tradeoff between flexibility and adherence to dietary constraints.

Speaker:
Pengfei Zhang, Bachelor
University of California-Irvine

Authors:
Pengfei Zhang, Bachelor - University of California-Irvine; Mohbat Fnu, MSc - Rensselaer Polytechnic Institute; Yutong Song, B.S - University of California-Irvine; Oshani Seneviratne, PhD - Rensselaer Polytechnic Institute; Zhongqi Yang, MSc - University of California, Irvine; Iman Azimi, PhD - University of California, Irvine; Amir Rahmani, PhD - University of California, Irvine;

Custom CSS

S114: Guardians of the Graphaxy

Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

Category

Description

Custom CSS