American Medical Informatics Association

A Knowledge Graph Approach To Discovering Drug Combination Therapies Across The Phenome

Presentation Type: Paper - Student
Student Paper Competition Nominee

Presentation Time: 11:15 AM - 11:27 AM

Primary Track: Data Science/Artificial Intelligence

Combining two clinically approved drugs has potential to improve treatment for common disease. But, with many thousands of combinations possible, clinically testing all pairs of drugs, with all common diseases, is not feasible. Here, we propose DRACO, a new machine learning method for discovering therapeutic drug combinations. Our model leverages a foundation model describing drug biology alongside a graph derived from clinical trials. We showcase DRACO's power to answer the question: given a drug and a health condition, what second drug would create an effective combination? In that task, 80% of our predictions have been previously reported. In the harder task of distinguishing the small number of reported combinations from millions of possible candidates, DRACO ranks 99.0% of held-out drug combinations at the highest 0.1%. We expect DRACO to be a useful tool for proposing new therapies across thousands of disease phenotypes and drug candidates.

Speaker(s):
Jianfeng Ke, PhD Candidate
University of Massachusetts Lowell

Author(s):
Jianfeng Ke, PhD Candidate - University of Massachusetts Lowell; Tingjian Ge, PhD - University of Massachusetts Lowell; Rachel Melamed, PhD - University of Massachusetts Lowell;

A Novel Approach to Zero-Shot Drug-Drug Interaction Prediction Enabled by EHR-Augmented Knowledge Graphs

Presentation Type: Paper - Regular
Presentation Time: 11:27 AM - 11:39 AM

Primary Track: Data Science/Artificial Intelligence

With more and more prescription drugs being administered, screening for adverse drug-drug interactions (DDIs) is now a major pharmacovigilance challenge. Electronic Health Record (EHR)-based statistical methods produce noisy predictions with high false positive rates due to confounding factors. Knowledge Graph (KG)-based machine learning methods, while more accurate, cannot predict interactions for drugs absent from the original graph, lacking zero-shot capability. We present a novel approach that augments large-scale but incomplete biomedical KGs with statistically noisy but comprehensive real-world edges derived from EHRs. We hypothesize that the EHR-derived associations act as bridges connecting unseen drugs to the pharmacological knowledge in KGs, thus enabling zero-shot capability. To rigorously test this, we designed a KG-embedding experiment that isolates drugs during training while preserving their interactions for testing. Results quantitatively demonstrate that our approach specifically enables effective zero-shot DDI prediction.

Speaker(s):
Srijith Chinthalapudi, High School Student
Stony Brook University

Author(s):
Srijith Chinthalapudi, High School Student - Stony Brook University; Sandeep Mallipattu, MD - Stony Brook University; Alisa Yurovsky, PhD - Stony Brook University; Tengfei Ma, PhD - Stony Brook University;

Evaluating NLP Approaches to Extract Drug Indications

Presentation Type: Paper - Regular
Presentation Time: 11:39 AM - 11:51 AM

Primary Track: Data Science/Artificial Intelligence

Reliable drug-indication knowledge is essential for clinical decision support and pharmacovigilance, yet manual curation is labor-intensive and difficult to scale. This study evaluated nine natural language processing approaches to extract therapeutic indications from FDA Structured Product Labels, benchmarking against 1,838 manually curated indication statements from twenty commonly prescribed medications. Methods included dictionary-based matching (QuickUMLS), a biomedical-pretrained transformer (PubMedBERT), and seven large language models spanning general-purpose and medical domain-specialized architectures. General-purpose LLMs achieved the highest performance, with Gemma2 attaining the best F1-Score (0.568) despite being the smallest model (2B parameters). Contrary to expectations, biomedical-specialized LLMs underperformed general-purpose counterparts, while dictionary-based matching yielded excessive false positives (F1 = 0.106). Performance differed markedly by drug, with narrow indication profiles yielding near-perfect accuracy and broader or symptom-adjacent indications proving consistently challenging. These findings establish a benchmark for LLM-based indication extraction and highlight opportunities for hybrid pipelines that balance high recall with precision-oriented validation.

Speaker(s):
Neil Sarkar, PhD, MLIS
Rhode Island Quality Institute & Brown University

Author(s):
Neil Sarkar, PhD, MLIS - Rhode Island Quality Institute & Brown University;

Assessing Multimodal AI for Visual Information Extraction of Pharmacology Data

Presentation Type: Paper - Student
Presentation Time: 11:51 AM - 12:03 PM

Primary Track: Data Science/Artificial Intelligence

While Americans are using herbal dietary supplements (natural products) more than ever, the consumption of natural products with prescription drugs can lead to harmful interactions. Pharmacovigilance of natural products depends on careful expert review and interpretation of a wide variety of evidence. In prior work, we demonstrated the value of knowledge graph (NP-KG) for assisting with natural product safety investigations. However, scaling the NP-KG from 33 natural products to the thousands on the market requires computer-assisted data extraction, particularly from visual elements (figures or tables) of pharmacology literature. We evaluated the accuracy and resilience of 8 open- and closed-source multimodal models by performing visual information extraction from select tables and images. The best performing models could accurately extract 90% of tabular data and 45% of data reported figures with a modified relative error rate of 0.05. Image resolution and information density were primary hindrances to better extraction performance.

Speaker(s):
Israel Dilan-Pantojas, Bsc. Computer Science
University of Pittsburgh

Author(s):
Israel Dilan-Pantojas, Bsc. Computer Science - University of Pittsburgh; Johnny Duong, Ph.D. - University of Pittsburgh; Kevin Lopes, Bachelor of Science - Rochester Institute of Technology; Richard Boyce, PhD - University of Pittsburgh;

Generative Transformers for Pharmacovigilance Signal Detection using Electronic Health Records

Presentation Type: Paper - Regular
Presentation Time: 12:03 PM - 12:15 PM

Primary Track: Data Science/Artificial Intelligence

Adverse drug reactions (ADRs) present substantial challenges to patient safety and to healthcare systems, often leading to hospitalizations and economic hardship. To mitigate these burdens, post-market surveillance is used to monitor ADRs after the conclusion of clinical trials. The commonly employed methods to estimate the strength of associations between drugs and ADRs from surveillance data include disproportionality metrics. However, these methods are limited in their ability to model temporal relationships indicating causality. Furthermore, they perform poorly when applied to comprehensive data from electronic health records (EHRs), failing to realize such data's potential to mitigate under-reporting and bias in ADR surveillance system data. To address these limitations, we propose novel methods using generative pre-trained transformers (GPT) for enhanced ADR signal detection in EHR data. On evaluations on data from two healthcare systems, the GPT models improved overall AUROC representing an absolute gain of 6–16 % over established baseline methods. This study highlights the potential of transformer-based models to advance pharmacovigilance by integrating comprehensive clinical data.

Speaker(s):
Yifan Wu, MPH, PhD
Johnson and Johnson Innovative Medicine

Author(s):
Trevor Cohen, MBChB, PhD - Biomedical Informatics and Medical Education, University of Washington; Ian De Boer, MS, MD - University of Washington;

Evaluating Large Language Models and LLM Agents for Insurance Genomics Workflows: From Network Lab Identification to QnA

Presentation Type: Podium Abstract
Presentation Time: 12:15 PM - 12:27 PM

Primary Track: Data Science/Artificial Intelligence

We evaluated large language model agents for genomic insurance workflows, including in-network payer identification, policy document retrieval, and policy-grounded QnA. GPT-5-Mini achieved the strongest and most stable performance, while other models showed variability or limited coverage. Access to the correct policy document substantially improved QnA accuracy. Results highlight opportunities and remaining challenges for reliable LLM-based automation of genetic testing coverage processes.

Speaker(s):
Junyoung Kim, MA
Boston Children's Hospital

Author(s):
Cong Liu, PhD - Boston Children's Hospital;

Custom CSS

TRI16: AI for Drug Discovery, Safety, and Pharmacovigilance (Oral Presentation)

TRI16: AI for Drug Discovery, Safety, and Pharmacovigilance (Oral Presentation)

Description

Custom CSS