- Home
- 2025 Annual Symposium Gallery
- Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
Custom CSS
double-click to edit, do not edit in source
11/19/2025 |
9:45 AM – 11:00 AM |
Room 8
S114: Guardians of the Graphaxy
Presentation Type: Oral Presentations
Identifying Missing IS-A Relations in SNOMED CT with Fine-Tuned Pre-trained Language Models and Non-lattice Subgraphs
Presentation Time: 09:45 AM - 10:00 AM
Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Information Retrieval, Data Standards, Artificial Intelligence
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
Ensuring the completeness of IS-A relations in SNOMED CT is crucial for maintaining its accuracy in clinical applications. In this study, we propose a hybrid approach leveraging non-lattice subgraphs and pre-trained language models (PLMs) to identify missing IS-A relations in SNOMED CT. We fine-tuned four BERT-based models: BERT, DistillBERT, DeBERTa, and BioClinicalBERT, and four generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Missing IS-A relations were identified through consensus predictions by all eight models. DeBERTa achieved the best performance (precision: 0.96, recall: 0.97, F1-score: 0.965) for IS-A relation prediction. Our approach identified 678 potential missing IS-A relations in SNOMED CT (March 2023 US Edition), of which 100 randomly selected cases were manually reviewed by a domain expert, confirming 93 as valid (93% precision). These results demonstrate the effectiveness of fine-tuned PLMs in detecting missing IS-A relations within non-lattice subgraphs, offering a promising avenue for improving SNOMED CT's quality.
Speaker:
Xubing Hao, Bachelor's degree
The University of Texas Health Science Center at Houston
Authors:
Xubing Hao, Bachelor's degree - The University of Texas Health Science Center at Houston; Rashmie Abeysinghe, PhD - The University of Texas Health Science Center at Houston; Jay Shi, MD - Intermountain Health; GQ Zhang, PhD - The University of Texas Health Science Center at Houston; Licong Cui, PhD - The University of Texas Health Science Center at Houston (UTHealth Houston);
Presentation Time: 09:45 AM - 10:00 AM
Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Large Language Models (LLMs), Information Retrieval, Data Standards, Artificial Intelligence
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
Ensuring the completeness of IS-A relations in SNOMED CT is crucial for maintaining its accuracy in clinical applications. In this study, we propose a hybrid approach leveraging non-lattice subgraphs and pre-trained language models (PLMs) to identify missing IS-A relations in SNOMED CT. We fine-tuned four BERT-based models: BERT, DistillBERT, DeBERTa, and BioClinicalBERT, and four generative large language models (LLMs): BioMistral, Llama3, Gemma2, and Phi-4. Missing IS-A relations were identified through consensus predictions by all eight models. DeBERTa achieved the best performance (precision: 0.96, recall: 0.97, F1-score: 0.965) for IS-A relation prediction. Our approach identified 678 potential missing IS-A relations in SNOMED CT (March 2023 US Edition), of which 100 randomly selected cases were manually reviewed by a domain expert, confirming 93 as valid (93% precision). These results demonstrate the effectiveness of fine-tuned PLMs in detecting missing IS-A relations within non-lattice subgraphs, offering a promising avenue for improving SNOMED CT's quality.
Speaker:
Xubing Hao, Bachelor's degree
The University of Texas Health Science Center at Houston
Authors:
Xubing Hao, Bachelor's degree - The University of Texas Health Science Center at Houston; Rashmie Abeysinghe, PhD - The University of Texas Health Science Center at Houston; Jay Shi, MD - Intermountain Health; GQ Zhang, PhD - The University of Texas Health Science Center at Houston; Licong Cui, PhD - The University of Texas Health Science Center at Houston (UTHealth Houston);
Xubing
Hao,
Bachelor's degree - The University of Texas Health Science Center at Houston
Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Information Extraction, Large Language Models (LLMs), Social Media and Connected Health, Real-World Evidence Generation, Drug Discoveries, Repurposing, and Side-effect
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide’s side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.
Speaker:
Zhijie Duan, B.S.
University of Pennsylvania
Authors:
Zhijie Duan, B.S. - University of Pennsylvania; Kai Wei, Master - University of Michigan; Zhaoqian Xue, BSc - Georgetown University; Jiayan Zhou, PhD - Stanford University School of Medicine; Shu Yang, PhD - University of Pennsylvania; Siyuan Ma, PhD - Vanderbilt University; Jin Jin, PhD - University of Pennsylvania; Lingyao Li, PhD - University of South Florida;
Presentation Time: 10:00 AM - 10:15 AM
Abstract Keywords: Information Extraction, Large Language Models (LLMs), Social Media and Connected Health, Real-World Evidence Generation, Drug Discoveries, Repurposing, and Side-effect
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide’s side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.
Speaker:
Zhijie Duan, B.S.
University of Pennsylvania
Authors:
Zhijie Duan, B.S. - University of Pennsylvania; Kai Wei, Master - University of Michigan; Zhaoqian Xue, BSc - Georgetown University; Jiayan Zhou, PhD - Stanford University School of Medicine; Shu Yang, PhD - University of Pennsylvania; Siyuan Ma, PhD - Vanderbilt University; Jin Jin, PhD - University of Pennsylvania; Lingyao Li, PhD - University of South Florida;
Zhijie
Duan,
B.S. - University of Pennsylvania
Knowledge-Graph-Enhanced Graph Neural Network for Early Prediction of Rapid Progression Subtype in Parkinson’s Disease
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Machine Learning, Knowledge Representation and Information Modeling, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses
Primary Track: Applications
We propose a knowledge-graph-enhanced graph neural network (GNN) framework for early identification of fast-progressing Parkinson’s disease patients using multimodal data. By incorporating curated biomedical knowledge from iBKH, our model prioritizes mechanistically relevant features and captures higher-order interactions. It outperforms baseline models, achieving superior accuracy and generalizability. This approach supports early subtype classification and highlights the promise of knowledge-driven precision medicine in neurodegenerative diseases.
Speaker:
Zuoyu Yan, Ph.D.
Weill Cornell Medicine
Authors:
Zuoyu Yan, Ph.D. - Weill Cornell Medicine; Haoyang Li, PhD - Weill Cornell Medicine; Chang Su, PhD - Weill Cornell Medicine; Fei Wang, PhD - Weill Cornell Medicine;
Presentation Time: 10:15 AM - 10:30 AM
Abstract Keywords: Machine Learning, Knowledge Representation and Information Modeling, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses
Primary Track: Applications
We propose a knowledge-graph-enhanced graph neural network (GNN) framework for early identification of fast-progressing Parkinson’s disease patients using multimodal data. By incorporating curated biomedical knowledge from iBKH, our model prioritizes mechanistically relevant features and captures higher-order interactions. It outperforms baseline models, achieving superior accuracy and generalizability. This approach supports early subtype classification and highlights the promise of knowledge-driven precision medicine in neurodegenerative diseases.
Speaker:
Zuoyu Yan, Ph.D.
Weill Cornell Medicine
Authors:
Zuoyu Yan, Ph.D. - Weill Cornell Medicine; Haoyang Li, PhD - Weill Cornell Medicine; Chang Su, PhD - Weill Cornell Medicine; Fei Wang, PhD - Weill Cornell Medicine;
Zuoyu
Yan,
Ph.D. - Weill Cornell Medicine
DGSurv: Dynamic Graph-Based Multimodal Learning for Interpretable Cancer Survival Prediction
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Machine Learning, Deep Learning, Artificial Intelligence, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses
Primary Track: Applications
Multimodal learning in cancer research offers transformative potential for enhancing medical care and guiding clinical decisions. Most analyses rely on unimodal inputs or employ simplistic multimodal fusion techniques, which do not optimally integrate the diverse data types. Additionally, there is a critical need for enhanced interpretative methods to fully exploit the depth of multimodal patient data. To address these issues, we propose DGSurv, a novel multimodal learning approach that utilizes a graph neural network (GNN) to dynamically map inter-modality relationships for cancer survival prediction. We demonstrate the utility of our proposed approach on cancer survival prediction, highlighting its potential to inform more accurate clinical decision-making.
We perform empirical evaluations on four cancer datasets from The Cancer Genome Atlas Program (TCGA) and demonstrate that DGSurv outperforms existing fusion techniques.
For interpretability, our study advances multimodal cancer analysis by effectively harnessing the full spectrum of multimodal data and significantly boosting its interpretability.
Speaker:
Sajjad Shahabi, Phd Student
University of Southern California
Authors:
Sajjad Shahabi, Phd Student - University of Southern California; Zijun Cui, Ph.D. - Michigan State University; Ruishan Liu, Ph.D. - University of Southern California; Joseph Carlson, M.D., Ph.D. - City of Hope; Yan Liu, Ph.D. - University of Southern California;
Presentation Time: 10:30 AM - 10:45 AM
Abstract Keywords: Machine Learning, Deep Learning, Artificial Intelligence, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses
Primary Track: Applications
Multimodal learning in cancer research offers transformative potential for enhancing medical care and guiding clinical decisions. Most analyses rely on unimodal inputs or employ simplistic multimodal fusion techniques, which do not optimally integrate the diverse data types. Additionally, there is a critical need for enhanced interpretative methods to fully exploit the depth of multimodal patient data. To address these issues, we propose DGSurv, a novel multimodal learning approach that utilizes a graph neural network (GNN) to dynamically map inter-modality relationships for cancer survival prediction. We demonstrate the utility of our proposed approach on cancer survival prediction, highlighting its potential to inform more accurate clinical decision-making.
We perform empirical evaluations on four cancer datasets from The Cancer Genome Atlas Program (TCGA) and demonstrate that DGSurv outperforms existing fusion techniques.
For interpretability, our study advances multimodal cancer analysis by effectively harnessing the full spectrum of multimodal data and significantly boosting its interpretability.
Speaker:
Sajjad Shahabi, Phd Student
University of Southern California
Authors:
Sajjad Shahabi, Phd Student - University of Southern California; Zijun Cui, Ph.D. - Michigan State University; Ruishan Liu, Ph.D. - University of Southern California; Joseph Carlson, M.D., Ph.D. - City of Hope; Yan Liu, Ph.D. - University of Southern California;
Sajjad
Shahabi,
Phd Student - University of Southern California
Adaptive Constraint Relaxation in Personalized Nutrition Recommendations: An LLM-Driven Knowledge Graph Retrieval Approach
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Information Retrieval, Personal Health Informatics, Large Language Models (LLMs), Data Mining, Knowledge Representation and Information Modeling, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Personalized food recommendation systems must balance various constraints, including medical guidelines, nutritional needs, and individual preferences. However, existing methods often struggle with overly restrictive queries, frequently failing to generate recommendations when no exact match exists. To address this challenge, we propose an adaptive knowledge graph (KG) retrieval framework that integrates Large Language Models (LLMs) for intelligent constraint relaxation. Our approach dynamically prioritizes constraints, ensuring that critical dietary requirements remain intact while selectively relaxing less essential ones. By leveraging LLM-driven constraint analysis and structured relaxation strategies, our system significantly enhances recommendation coverage without compromising key dietary needs, while maintaining best recommendation performance. Experimental results on original and the extended-constraint dataset demonstrate that our method successfully retrieves recommendations in cases where previous approaches fail, achieving higher retrieval accuracy and a balanced tradeoff between flexibility and adherence to dietary constraints.
Speaker:
Pengfei Zhang, Bachelor
University of California-Irvine
Authors:
Pengfei Zhang, Bachelor - University of California-Irvine; Mohbat Fnu, MSc - Rensselaer Polytechnic Institute; Yutong Song, B.S - University of California-Irvine; Oshani Seneviratne, PhD - Rensselaer Polytechnic Institute; Zhongqi Yang, MSc - University of California, Irvine; Iman Azimi, PhD - University of California, Irvine; Amir Rahmani, PhD - University of California, Irvine;
Presentation Time: 10:45 AM - 11:00 AM
Abstract Keywords: Information Retrieval, Personal Health Informatics, Large Language Models (LLMs), Data Mining, Knowledge Representation and Information Modeling, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Personalized food recommendation systems must balance various constraints, including medical guidelines, nutritional needs, and individual preferences. However, existing methods often struggle with overly restrictive queries, frequently failing to generate recommendations when no exact match exists. To address this challenge, we propose an adaptive knowledge graph (KG) retrieval framework that integrates Large Language Models (LLMs) for intelligent constraint relaxation. Our approach dynamically prioritizes constraints, ensuring that critical dietary requirements remain intact while selectively relaxing less essential ones. By leveraging LLM-driven constraint analysis and structured relaxation strategies, our system significantly enhances recommendation coverage without compromising key dietary needs, while maintaining best recommendation performance. Experimental results on original and the extended-constraint dataset demonstrate that our method successfully retrieves recommendations in cases where previous approaches fail, achieving higher retrieval accuracy and a balanced tradeoff between flexibility and adherence to dietary constraints.
Speaker:
Pengfei Zhang, Bachelor
University of California-Irvine
Authors:
Pengfei Zhang, Bachelor - University of California-Irvine; Mohbat Fnu, MSc - Rensselaer Polytechnic Institute; Yutong Song, B.S - University of California-Irvine; Oshani Seneviratne, PhD - Rensselaer Polytechnic Institute; Zhongqi Yang, MSc - University of California, Irvine; Iman Azimi, PhD - University of California, Irvine; Amir Rahmani, PhD - University of California, Irvine;
Pengfei
Zhang,
Bachelor - University of California-Irvine
Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
Category
Paper - Regular
Description
Custom CSS
double-click to edit, do not edit in source
11/19/2025 11:00 AM (Eastern Time (US & Canada))