American Medical Informatics Association - Towards Early Prediction of Amyotrophic Lateral Sclerosis Empowered by Machine Learning and Clinical Big Data

Multimodal EHR-Based Prediction of Pediatric Asthma Exacerbations

Presentation Type: Paper - Regular

Click to View Presentation

Presentation Time: 02:00 PM - 02:12 PM

Primary Track: Data Science/Artificial Intelligence

Pediatric asthma exacerbations are a frequent cause of emergency department (ED) visits and hospitalizations, yet accurate risk prediction remains limited and no consensus risk scores exist. Using UF Health electronic health records (EHRs) from 2011-2023, we evaluated two computable phenotypes (i.e., CAPriCORN and COMPAC) to predict exacerbations over 6-, 12-, and 24-month horizons. Exacerbations were defined using a validated composite of diagnosis codes from ED, inpatient, or outpatient encounters combined with systemic corticosteroids prescriptions. Several commonly used machine learning (ML) models were trained with stratified five-fold cross-validation, Bayesian hyperparameter optimization, and Youden’s J thresholding. XGBoost achieved the best performance, with SHapley Additive exPlanations (SHAP) highlighting note-derived symptom terms and rescue-medication use as dominant predictors. Future work will focus on external validation and assessment of generalizability. This interpretable, text-integrated framework may support child-specific risk stratification and inform EHR-based decision support for timely pediatric
asthma management.

Speaker(s):
Zhengkang Fan, Master of computer science
University of Florida

Author(s):
Zhengkang Fan, Master of computer science - University of Florida; Jinqian Pan, Master - University of Florida, HOBI; Mengxian Lyu, Master - University of Florida; Renjie Liang, Master - University of Florida, HOBI; Chengkun Sun, Master - University of Florida; Yonghui Wu, PhD - University of Florida; David Fedele, Phd - Center for Healthcare Delivery Science, Nemours Children’s Health; Jennifer Fishe, MD - Department of Emergency Medicine, University of Florida; Jie Xu, PhD - University of Florida;

A Domain-Based Stacked Ensemble Model for Predicting Postoperative Delirium Using Comprehensive Perioperative EHR Data: Case-Control Study

Presentation Type: Podium Abstract

Click to View Presentation

Presentation Time: 02:12 PM - 02:24 PM

Primary Track: Data Science/Artificial Intelligence

Postoperative delirium (POD) affects 15-50% of surgical patients, causing prolonged hospitalization, cognitive decline, and increased mortality, yet 60% of cases remain undetected. Existing POD prediction models demonstrate limited generalizability, procedure-specificity, reliance on either preoperative or intraoperative variables exclusively, and emphasis on discrimination while neglecting calibration.
We developed an interpretable domain-based stacked ensemble model using comprehensive perioperative electronic health record data. This retrospective case-control study utilized data from the Indiana Network for Patient Care (2012-2022) including adults undergoing non-cardiac, non-obstetric surgery. POD cases were identified using combined ICD codes and positive Confusion Assessment Method (CAM) assessments within 7 days postoperatively. Cases were 1:1 matched with controls by age, sex, race, and surgery year, yielding 5,729 encounters.
We engineered 87 features across three clinician-informed domains: patient-related (demographics, ASA class, Charlson Comorbidity Index, comorbidities, medications), surgery-related (duration, specialties), and anesthetic-related (hemodynamics, anesthetics, vasopressors, antihypertensives). Six baseline models (L2 logistic regression, elastic-net logistic regression, Random Forest, Histogram Gradient Boosting, XGBoost variants) were developed. A two-stage stacked ensemble employed three domain-specific LightGBM classifiers (600 estimators, learning_rate=0.03, subsample=0.8) trained via 5-fold patient-grouped cross-validation, with outputs fed to a logistic regression meta-learner.
The ensemble achieved AUROC=0.905, Brier score=0.123, and decision curve analysis net benefit=0.322. Surgery-related features dominated predictions (97.3% meta-learner contribution), with operative duration and emergency status as top predictors via TreeSHAP analysis. Patient-related (ASA class, psychiatric disorders, age, respiratory failure) and anesthetic-related factors (MAC-hours, propofol, neuromuscular blockers) contributed <2% combined. This interpretable model supports precision prevention strategies for POD.

Speaker(s):
Cristina Barboi, MD
Indiana University

Author(s):
Cristina Barboi, MD - Indiana University; Shikhar Shukla, MS Health Informatics - Indiana University;

Towards Early Prediction of Amyotrophic Lateral Sclerosis Empowered by Machine Learning and Clinical Big Data

Presentation Type: Paper - Student

Click to View Presentation

Student Paper Competition Nominee

Presentation Time: 02:24 PM - 02:36 PM

Primary Track: Clinical Research Informatics

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disorder characterized by substantial symptom heterogeneity and overlap with other neurological conditions, often delaying diagnosis. This study developed a consensus-based feature selection framework to identify a stable and parsimonious minimal feature set for early ALS prediction using large-scale observational data. Using multi-year medical claims and multi-site EHRs, we identified 1,716 ALS cases with matched controls. The approach integrated variability across sample, task, and model dimensions to isolate features predictive up to 18 months before diagnosis. Predictive models using LASSO regression and GBT were evaluated with AUROC and classification metrics. The resulting nine-feature set achieved AUROC values above 0.85 across time windows. The GBT model was further evaluated in musculoskeletal, nervous system, and limb or bulbar subgroups, demonstrating reliable discrimination and preserved sensitivity and specificity. These findings highlight the potential of stable minimal feature sets to support earlier ALS identification.

Speaker(s):
Askar Safipour Afshar, PhD student
University of Missouri Columbia

Author(s):
Askar Safipour Afshar, PhD student - University of Missouri Columbia; Jefferey Statland, MD - University of Kansas Medical Center; Xing Song, PhD - University of Missouri;

Patient-Centered Stroke Detection in High-Risk Groups Leveraging Graph-Augmented Large Language Models

Presentation Type: Podium Abstract

Click to View Presentation

Presentation Time: 02:36 PM - 02:48 PM

Primary Track: Clinical Research Informatics

We developed a graph-augmented LLM pipeline to identify clinically meaningful stroke-related symptoms from patient-initiated messages and evaluated a rule-based screening rubric in high-risk individuals with diabetes. Using dual ML signals (GNN importance and elastic-net associations), we derived High- and Moderate-risk symptom sets and simulated 7-day and 14-day screening. The rubric achieved PPV 1.00 with sensitivities of 0.46 (7-day) and 0.60 (14-day), demonstrating the potential of message-based early stroke detection.

Speaker(s):
Jiyeong Kim, PhD
Stanford University

Author(s):
Jiyeong Kim, PhD - Stanford University; Stephen Ma, MD, PhD - Stanford University School of Medicine; Jonathan Chen, MD, PhD - Stanford University Hospital; Julia Adler-Milstein, PhD, FACMI - UCSF School of Medicine;

ScreeningPaL: LLM-NLP Enabled Early Autism Detection Method from Caregiver’s Free-Text Input

Presentation Type: Paper - Student

Click to View Presentation

Student Paper Competition Nominee

Presentation Time: 02:48 PM - 03:00 PM

Primary Track: Data Science/Artificial Intelligence

Identifying autism traits and detecting the spectrum of disorder early can substantially improve quality of life. We present a text-driven approach for early autism risk detection that analyzes caregiver-reported behavioral descriptions using advanced natural language processing techniques. Synthetic free-text generated from validated screening items is used to train multiple language models, which are then evaluated on an external benchmark dataset (TASD) to assess generalization under domain shift. Fine-tuned transformer models achieve the highest performance, reaching 90% accuracy, outperforming GPT and Gemini models, and conventional NLP baselines. Augmenting training datasets with noisy, realistic text further improves model performance, specifically recall in traditional pipelines, demonstrating the potential of noise-aware data augmentation for free-text screening. This methodology enables translational and low-cost early assessment without requiring structured questionnaires or speech samples. This approach provides early cues that may assist specialist evaluation, promote accessible and proactive developmental health monitoring.

Speaker(s):
Sumaiya Afroz Mila, MSc
University of Florida

Author(s):
Jeba Maliha, BSc - Central Michigan University; Md Rafiul Kabir, PhD - Central Michigan Univerisity; Ankan Ghosh, BTech - University of Florida; Sandip Ray, PhD - University of Florida;

Multimodal Fusion of Clinical and Imaging Data for Early Prediction of Massive Transfusion in Trauma

Presentation Type: Paper - Regular

Click to View Presentation

Presentation Time: 03:00 PM - 03:12 PM

Primary Track: Data Science/Artificial Intelligence

Massive transfusion (MT) prediction in trauma remains limited by scoring systems with low sensitivity, specificity, and poor generalizability. Leveraging advances in multimodal deep learning, we developed a joint fusion architecture integrating structured electronic health record (EHR) data with admission chest X-rays to improve early MT prediction. We analyzed 33,824 trauma patients (2014–2024), including 435 MT cases (1.3%). Structured variables were modeled using a multilayer perceptron, while imaging features were extracted using a pretrained DenseNet-121; feature-level fusion enabled joint representation learning. Class imbalance was addressed using SMOTE for structured data and geometric augmentation for images. The fusion model achieved an AUC of 0.669, surpassing unimodal models. At the optimal threshold, sensitivity (0.72) and specificity (0.84) exceeded performance of the ABC score benchmark. Grad-CAM visualizations highlighted attention over thoracic and upper abdominal injury regions. These findings demonstrate feasibility of real-world multimodal MT prediction and support future multi-site validation.

Speaker(s):
Michael Kolesnikov, MSN, PhD
OHSU

Author(s):
Michael Kolesnikov, MSN, PhD - OHSU; Phillip Jenkins, MD - OHSU; Vishnu Mohan, MD, MBI, FACP, FAMIA - Oregon Health & Science University; Laszlo Kiraly, MD - OHSU; Karen Eden, PhD - Oregon Health & Science University; Steven Bedrick, PhD - Oregon Health & Science University; Anais Kolesnikov, BS - Oregon Health and Science University;

Custom CSS

TRI30: Clinical Risk, Prediction, and Outcomes (Oral Presentations)

Towards Early Prediction of Amyotrophic Lateral Sclerosis Empowered by Machine Learning and Clinical Big Data

Category

Description

Custom CSS