Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change
3/10/2025 |
3:30 PM – 5:00 PM |
Conference A
S05: Predictive Modeling: Understanding Risk
Presentation Type: Podium Abstract
Session Credits: 1.5
Session Chair:
Shauna Overgaard, PhD - Mayo Clinic
Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification With Pre-Trained Language Models
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: EHR-based Phenotyping, Natural Language Processing, Secondary Use of EHR Data, Data-Driven Research and Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using binary relevance (acc=0.86, F1=0.78). MentalBERT (F1=0.74) also exceeded BioClinicalBERT (F1=0.72). RoBERTa fine-tuned with a single multi-label classifier further improved performance (acc=0.88, F1=0.81), highlighting that models pre-trained on domain-relevant data and single multi-label strategies enhance efficiency and performance.
Speaker(s):
Zehan (Leo) Li, PhD
The Univeristy of Texas Health Science Center at Houston (UTHealth) School of Biomedical Informatics
Author(s):
Yan Hu, MS - UTHealth Science Center Houston; Hongfang Liu, PhD - University of Texas Health Science Center at Houston; Hua Xu, Ph.D - Yale University; Ming Huang, PhD - UTHealth Houston;
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: EHR-based Phenotyping, Natural Language Processing, Secondary Use of EHR Data, Data-Driven Research and Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using binary relevance (acc=0.86, F1=0.78). MentalBERT (F1=0.74) also exceeded BioClinicalBERT (F1=0.72). RoBERTa fine-tuned with a single multi-label classifier further improved performance (acc=0.88, F1=0.81), highlighting that models pre-trained on domain-relevant data and single multi-label strategies enhance efficiency and performance.
Speaker(s):
Zehan (Leo) Li, PhD
The Univeristy of Texas Health Science Center at Houston (UTHealth) School of Biomedical Informatics
Author(s):
Yan Hu, MS - UTHealth Science Center Houston; Hongfang Liu, PhD - University of Texas Health Science Center at Houston; Hua Xu, Ph.D - Yale University; Ming Huang, PhD - UTHealth Houston;
Predicting Natural Product-Drug Interactions with Knowledge Graph Embeddings
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Knowledge Representation, Management, or Engineering, Data Mining and Knowledge Discovery, Machine Learning, Generative AI, and Predictive Modeling, Informatics Research/Biomedical Informatics Research Methods
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Proactive Machine Learning in Biomedical Applications: The Power of Generative AI and Reinforcement Learning
Natural product-drug interactions (NPDIs) occurring due to concomitant exposure to botanical products and
prescription drug therapies could lead to adverse events or reduced treatment efficacy. To better understand and
address potential safety concerns, researchers investigate the underlying NPDI mechanisms using in vitro and clinical studies. Given that natural products are complex mixtures of compounds that are often not well characterized, it is important to advance computational methods for novel NPDI research. Biomedical knowledge graphs (KGs) can aid in identifying potential mechanisms to support such research efforts. We evaluated the ability of several KG embedding methods to improve NPDI prediction on NP-KG, a large-scale, heterogeneous, biomedical KG. We found that the ComplEx model outperformed other KG embedding approaches in both intrinsic and extrinsic evaluations. Future work will focus on utilizing the embeddings to identify underlying mechanisms of novel, potential NPDIs.
Speaker(s):
Sanya Taneja, MS
University of Pittsburgh
Author(s):
Richard Boyce, PhD - University of Pittsburgh; Israel Dilan-Pantojas, Bsc. Computer Science - University of Pittsburgh;
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Knowledge Representation, Management, or Engineering, Data Mining and Knowledge Discovery, Machine Learning, Generative AI, and Predictive Modeling, Informatics Research/Biomedical Informatics Research Methods
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Proactive Machine Learning in Biomedical Applications: The Power of Generative AI and Reinforcement Learning
Natural product-drug interactions (NPDIs) occurring due to concomitant exposure to botanical products and
prescription drug therapies could lead to adverse events or reduced treatment efficacy. To better understand and
address potential safety concerns, researchers investigate the underlying NPDI mechanisms using in vitro and clinical studies. Given that natural products are complex mixtures of compounds that are often not well characterized, it is important to advance computational methods for novel NPDI research. Biomedical knowledge graphs (KGs) can aid in identifying potential mechanisms to support such research efforts. We evaluated the ability of several KG embedding methods to improve NPDI prediction on NP-KG, a large-scale, heterogeneous, biomedical KG. We found that the ComplEx model outperformed other KG embedding approaches in both intrinsic and extrinsic evaluations. Future work will focus on utilizing the embeddings to identify underlying mechanisms of novel, potential NPDIs.
Speaker(s):
Sanya Taneja, MS
University of Pittsburgh
Author(s):
Richard Boyce, PhD - University of Pittsburgh; Israel Dilan-Pantojas, Bsc. Computer Science - University of Pittsburgh;
Opioids Overdose Death Prediction with Graph Neural Networks
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Public Health Informatics, Machine Learning, Generative AI, and Predictive Modeling, Data Mining and Knowledge Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
The United States opioid crisis persists, presenting an urgent need for effective surveillance and management. Existing research has utilized various data-driven approaches to forecast adverse events, but lacks the capacity to fully capture the complex spatial and temporal dynamics of opioid overdose incidents. To address this gap, we introduce the Spatial-Temporal Graph Neural Network (ST-GNN) tailored for opioid overdose prediction. This model integrates a county-level graph to model information propagation through spatial relationships. Additionally, a Long short-term memory (LSTM) model is applied at every node to capture the temporal evolution of the counties. We trained and tested our ST-GNN model with data from Ohio, and demonstrate superior performance of the ST-GNN model in predicting opioid overdose death rate; thereby offering promising new directions for mitigating the opioid crisis.
Speaker(s):
Changchang Yin, MS
The Ohio State University
Author(s):
Zishan Gu, Master - OSU; Changchang Yin, M.S. - The Ohio State University; Naleef Fareed, PhD MBA - The Ohio State University Dept Biomedical Informatics; Soledad Fernandez - The Ohio State University Dept of Biomedical Informatics; Ping Zhang, PhD, FAMIA - The Ohio State University;
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Public Health Informatics, Machine Learning, Generative AI, and Predictive Modeling, Data Mining and Knowledge Discovery
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
The United States opioid crisis persists, presenting an urgent need for effective surveillance and management. Existing research has utilized various data-driven approaches to forecast adverse events, but lacks the capacity to fully capture the complex spatial and temporal dynamics of opioid overdose incidents. To address this gap, we introduce the Spatial-Temporal Graph Neural Network (ST-GNN) tailored for opioid overdose prediction. This model integrates a county-level graph to model information propagation through spatial relationships. Additionally, a Long short-term memory (LSTM) model is applied at every node to capture the temporal evolution of the counties. We trained and tested our ST-GNN model with data from Ohio, and demonstrate superior performance of the ST-GNN model in predicting opioid overdose death rate; thereby offering promising new directions for mitigating the opioid crisis.
Speaker(s):
Changchang Yin, MS
The Ohio State University
Author(s):
Zishan Gu, Master - OSU; Changchang Yin, M.S. - The Ohio State University; Naleef Fareed, PhD MBA - The Ohio State University Dept Biomedical Informatics; Soledad Fernandez - The Ohio State University Dept of Biomedical Informatics; Ping Zhang, PhD, FAMIA - The Ohio State University;
Intelligent Patient Monitoring for Enhanced Forecasting of SpO2 Instability
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Data Quality, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Clinical Research Informatics
Programmatic Theme: Translational Bioinformatics Using Multi-Modal Patient Data and AI
Rapid detection of cardiorespiratory instability (CRI) in critical care units is crucial for improving patient outcomes, yet traditional monitoring systems often lack timely insights into patient deterioration. We present a machine learning framework that leverages high- and low-frequency plethysmogram data, along with electrocardiogram (ECG) and respiratory rate signals, to detect and forecast peripheral oxygen saturation (SpO₂) instability in real time. Physiological data were collected from patients, and over 400 features were derived to capture signal variations and trends. An artifact discrimination step, using a random forest classifier and a kernel density decision tree, addressed artifacts caused by patient movement or sensor issues. Clinical experts enhanced data labeling through an iterative annotation process utilizing active learning, ensuring high-quality training data. The artifact discriminator achieved an area under the receiver operating characteristic curve (AUC) of 0.97. Of the 20-second intervals where SpO₂ fell below 90%, 71% were confidently classified as non-artifacts and 17% as artifacts. After data cleaning, over two million non-artifact intervals were aggregated into 48,238 CRI episodes from 3,684 patients with a 17% mortality rate. A random forest classifier produced risk scores every five minutes, significantly increasing when approaching CRI events compared to stable periods. Our results demonstrate the feasibility of deploying machine learning models for timely detection and forecasting of SpO₂ instability in critical care settings, offering clinicians a valuable tool to improve patient outcomes through timely interventions.
Speaker(s):
Chi-En Teh, PhD
Carnegie Mellon University
Author(s):
Vedant Sanil, MSc - Carnegie Mellon University; Gus Welter, MSc - Carnegie Mellon University; Karina Kraevsky-Phillips, PhD(c), MA, BSN, RN, CCRN - University of Pittsburgh; Marilyn Hravnak, PhD - University of Pittsburgh; Artur Dubrawski, PhD - Carnegie Mellon University; Gilles Clermont, MD, MSc - University of Pittsburgh; Salah Al-Zaiti, PhD - University of Rochester;
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Machine Learning, Generative AI, and Predictive Modeling, Data Quality, Clinical Decision Support for Translational/Data Science Interventions
Primary Track: Clinical Research Informatics
Programmatic Theme: Translational Bioinformatics Using Multi-Modal Patient Data and AI
Rapid detection of cardiorespiratory instability (CRI) in critical care units is crucial for improving patient outcomes, yet traditional monitoring systems often lack timely insights into patient deterioration. We present a machine learning framework that leverages high- and low-frequency plethysmogram data, along with electrocardiogram (ECG) and respiratory rate signals, to detect and forecast peripheral oxygen saturation (SpO₂) instability in real time. Physiological data were collected from patients, and over 400 features were derived to capture signal variations and trends. An artifact discrimination step, using a random forest classifier and a kernel density decision tree, addressed artifacts caused by patient movement or sensor issues. Clinical experts enhanced data labeling through an iterative annotation process utilizing active learning, ensuring high-quality training data. The artifact discriminator achieved an area under the receiver operating characteristic curve (AUC) of 0.97. Of the 20-second intervals where SpO₂ fell below 90%, 71% were confidently classified as non-artifacts and 17% as artifacts. After data cleaning, over two million non-artifact intervals were aggregated into 48,238 CRI episodes from 3,684 patients with a 17% mortality rate. A random forest classifier produced risk scores every five minutes, significantly increasing when approaching CRI events compared to stable periods. Our results demonstrate the feasibility of deploying machine learning models for timely detection and forecasting of SpO₂ instability in critical care settings, offering clinicians a valuable tool to improve patient outcomes through timely interventions.
Speaker(s):
Chi-En Teh, PhD
Carnegie Mellon University
Author(s):
Vedant Sanil, MSc - Carnegie Mellon University; Gus Welter, MSc - Carnegie Mellon University; Karina Kraevsky-Phillips, PhD(c), MA, BSN, RN, CCRN - University of Pittsburgh; Marilyn Hravnak, PhD - University of Pittsburgh; Artur Dubrawski, PhD - Carnegie Mellon University; Gilles Clermont, MD, MSc - University of Pittsburgh; Salah Al-Zaiti, PhD - University of Rochester;
Integrating Social Determinants of Health into Knowledge Graphs: Evaluating Prediction Bias and Fairness in Healthcare
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Social Determinants of Health, Fairness and Disparity Research in Health Informatics, Knowledge Representation, Management, or Engineering, Data Mining and Knowledge Discovery, Data Integration
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
Social determinants of health (SDoH) play a crucial role in patient health outcomes, yet their integration into biomedical knowledge graphs remains underexplored. This study addresses this gap by constructing an SDoH-enriched knowledge graph using the MIMIC-III dataset and PrimeKG. We introduce a novel fairness formulation for graph embeddings, focusing on invariance with respect to sensitive SDoH information. Employing a heterogeneous-GCN model for drug-disease link prediction, we detect biases related to various SDoH factors. To mitigate these biases, we propose a post-processing method that strategically reweights edges connected to SDoHs, balancing their influence on graph representations. This approach represents one of the first comprehensive investigations into fairness issues within biomedical knowledge graphs incorporating SDoH. Our work not only highlights the importance of considering SDoH in medical informatics but also provides a concrete method for reducing SDoH-related biases in link prediction tasks, paving the way for more equitable healthcare recommendations. Our code is available at \url{https://github.com/hwq0726/SDoH-KG}.
Speaker(s):
Weiqing He, bachelor
University of Pennsylvania
Author(s):
Tianqi Shang, Master of Engineer in Computer Science - University of Pennsylvania; Weiqing He, bachelor - University of Pennsylvania; Tianlong Chen, PhD - The University of North Carolina at Chapel Hill; Ying Ding - University of Texas at Austin; Huanmei Wu, FAMIA, PhD - Temple University; Kaixiong Zhou, PhD - University of North Carolina at Chapel Hill; Li Shen, Ph.D. - University of Pennsylvania;
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Social Determinants of Health, Fairness and Disparity Research in Health Informatics, Knowledge Representation, Management, or Engineering, Data Mining and Knowledge Discovery, Data Integration
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Health Data Science and Artificial Intelligence Innovation: From Single-Center to Multi-Site
Social determinants of health (SDoH) play a crucial role in patient health outcomes, yet their integration into biomedical knowledge graphs remains underexplored. This study addresses this gap by constructing an SDoH-enriched knowledge graph using the MIMIC-III dataset and PrimeKG. We introduce a novel fairness formulation for graph embeddings, focusing on invariance with respect to sensitive SDoH information. Employing a heterogeneous-GCN model for drug-disease link prediction, we detect biases related to various SDoH factors. To mitigate these biases, we propose a post-processing method that strategically reweights edges connected to SDoHs, balancing their influence on graph representations. This approach represents one of the first comprehensive investigations into fairness issues within biomedical knowledge graphs incorporating SDoH. Our work not only highlights the importance of considering SDoH in medical informatics but also provides a concrete method for reducing SDoH-related biases in link prediction tasks, paving the way for more equitable healthcare recommendations. Our code is available at \url{https://github.com/hwq0726/SDoH-KG}.
Speaker(s):
Weiqing He, bachelor
University of Pennsylvania
Author(s):
Tianqi Shang, Master of Engineer in Computer Science - University of Pennsylvania; Weiqing He, bachelor - University of Pennsylvania; Tianlong Chen, PhD - The University of North Carolina at Chapel Hill; Ying Ding - University of Texas at Austin; Huanmei Wu, FAMIA, PhD - Temple University; Kaixiong Zhou, PhD - University of North Carolina at Chapel Hill; Li Shen, Ph.D. - University of Pennsylvania;
EntroLLM: Leveraging Entropy and Large Language Model Embeddings for Enhanced Risk Prediction with Wearable Device Data
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Mobile Health, Wearable Devices and Patient-Generated Health Data, Machine Learning, Generative AI, and Predictive Modeling, Outcomes Research, Clinical Epidemiology, Population Health
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Wearable devices collect high-dimensional, time-series, and complex structured data that are challenging for traditional models to handle efficiently. We propose EntroLLM, a new method that combines entropy measures and the embeddings, i.e., low-dimensional representation, generated from large language models (LLMs) to enhance risk prediction using wearable device data. In EntroLLM, the entropy quantifies the variability of a subject’s physical activity patterns, while the LLM embedding approximates the latent temporal structure. We evaluate the feasibility and performance of EntroLLM using NHANES data to predict overweight status based on demographics and physical activity collected from wearable devices. Results show that combining entropy with GPT-based embeddings significantly improves model performance compared to baseline models and other embedding techniques, leading to an average increase in AUC from 0.56 to 0.64. EntroLLM showcases the potential of combining entropy and LLM-based embeddings and offers a promising approach to wearable device data analysis for predicting health outcomes.
Speaker(s):
Xueqing Huang, Master of Science
Columbia University
Author(s):
Xueqing Huang, Master of Science - Columbia University; Tian Gu, PhD - Columbia University;
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Mobile Health, Wearable Devices and Patient-Generated Health Data, Machine Learning, Generative AI, and Predictive Modeling, Outcomes Research, Clinical Epidemiology, Population Health
Primary Track: Data Science/Artificial Intelligence
Programmatic Theme: Harnessing the Power of Large Language Models in Health Data Science
Wearable devices collect high-dimensional, time-series, and complex structured data that are challenging for traditional models to handle efficiently. We propose EntroLLM, a new method that combines entropy measures and the embeddings, i.e., low-dimensional representation, generated from large language models (LLMs) to enhance risk prediction using wearable device data. In EntroLLM, the entropy quantifies the variability of a subject’s physical activity patterns, while the LLM embedding approximates the latent temporal structure. We evaluate the feasibility and performance of EntroLLM using NHANES data to predict overweight status based on demographics and physical activity collected from wearable devices. Results show that combining entropy with GPT-based embeddings significantly improves model performance compared to baseline models and other embedding techniques, leading to an average increase in AUC from 0.56 to 0.64. EntroLLM showcases the potential of combining entropy and LLM-based embeddings and offers a promising approach to wearable device data analysis for predicting health outcomes.
Speaker(s):
Xueqing Huang, Master of Science
Columbia University
Author(s):
Xueqing Huang, Master of Science - Columbia University; Tian Gu, PhD - Columbia University;