American Medical Informatics Association - Unmasking Toxic Mimicry in Medical Offline Reinforcement Learning for ICU Sepsis Management via Counterfactual Clinical Audits

Unmasking Toxic Mimicry in Medical Offline Reinforcement Learning for ICU Sepsis Management via Counterfactual Clinical Audits

Presentation Type: Paper - Regular
Presentation Time: 03:30 PM - 03:42 PM

Abstract Keywords: Causal Inference, Deep Learning, Critical Care, Artificial Intelligence, Clinical Decision Support, Evaluation
Programmatic Theme: Clinical Informatics

Offline reinforcement learning (RL) offers considerable promise for optimizing ICU treatment decisions, yet standard evaluation metrics Mean Squared Error (MSE) and Fitted Q-Evaluation (FQE) assess only behavioral imitation and cannot detect Toxic Mimicry, a failure mode in which agents replicate harmful patterns such as treatment withdrawal during comfort-care transitions. Using the MIMIC-III database, we propose the Counterfactual Clinical Audit (CCA) framework, which stress-tests RL agents through physiological perturbations anchored in Surviving Sepsis Campaign (SSC) guidelines. We audit a Medical Decision Transformer (MedDT) and a Historical Causal Transformer (HCT-RL), the latter employing Causal Action Shielding, propensity-based importance weighting, and Conservative Q-Learning. CCA reveals that MedDT paradoxically reduces vasopressor dosage as lactate escalates, contradicting resuscitation guidelines, while HCT-RL maintains physiologically consistent responses. These findings expose a systemic misalignment between statistical fit and clinical safety, supporting counterfactual audits as a necessary evaluation standard for medical RL.

Speaker(s):
Hangqi Ren, Master of Science
Vanderbilt University

Author(s):
Hangqi Ren, Master of Science - Vanderbilt University; Junyi Liao, Doctor of Philosophy - Duke University;

cNODE: A Continuous-Time Neural ODE Transformer for Early Sepsis Prediction

Presentation Type: Paper - Regular
Presentation Time: 03:42 PM - 03:54 PM

Abstract Keywords: Artificial Intelligence, Machine Learning, Critical Care
Programmatic Theme: Clinical Research Informatics

Early prediction of sepsis from electronic health record is challenging because clinical time series are sparse, irregularly sampled, and heavily affected by missingness. We propose a continuous-time Transformer framework that explicitly models patient trajectories along clinical time. The model integrates missingness-aware token representations with neural ordinary differential equation (ODE)-based latent dynamics, allowing patient states to evolve continuously between observations without time discretization. This formulation enables more faithful modeling of irregular clinical events compared to conventional discrete-time architectures. We evaluate the approach on an ICU cohort in MIMIC-IV across multiple prediction horizons prior to sepsis onset. At the 24-hour prediction horizon, the model achieves an AUROC of 0.734 and an AUPRC of 0.769, outperforming existing methods. These results demonstrate that combining Transformer representations with continuous-time latent dynamics provides an effective approach to modeling irregular clinical time series and improving early sepsis risk prediction.

Speaker(s):
Yiran Wang, Master
Yale School of Medicine

Author(s):
Yiran Wang, Master - Yale School of Medicine; Jihoon Kim, PhD - Yale University;

Predicting Critical Care Outcomes from Temporally Sparse EHR Trajectories

Presentation Type: Paper - Student
Presentation Time: 03:54 PM - 04:06 PM

Abstract Keywords: Machine Learning, Deep Learning, Artificial Intelligence, Information Visualization, Patient-/Person-Generated Health Data, Data Mining, Evaluation, Causal Inference
Programmatic Theme: Clinical Informatics

We present a graph-based framework for predicting in-hospital mortality (IHM) and prolonged ICU length of stay
(LOS >7 days) from irregularly sampled electronic health record data without imputation. Patient trajectories are
modeled as directed graphs and processed through an Additive Temporal Graph Network (ATGN) whose additive
output structure yields exact, time-resolved attributions by construction. We evaluate on MIMIC-IV (N=30,774) and
eICU-CRD (N=63,652) using 5-fold cross-validation, comparing against logistic regression, XGBoost, GRU-D, and
Transformer baselines. ATGN achieves the highest discrimination on seven of eight task-metric combinations (best
AUROC: 0.897 for IHM on MIMIC-IV). Perturbation analysis identifies respiratory biomarkers and Glasgow Coma
Scale as primary mortality drivers. These results suggest that preserving native temporal structure and providing
faithful attributions can be achieved without sacrificing predictive performance relative to imputation-based alterna-
tives.

Speaker(s):
Harsha Battula, MS
University of Minnesota Twin Cities

Author(s):
Harsha Battula, MS - University of Minnesota Twin Cities; Trevor Winger - University of Minnesota; Jaideep Srivastava - University of Minnesota;

Expected Changes in Mortality Associated with Performance Improvements of a Clinical Prediction Model for Sepsis: A Probabilistic Decision Analysis

Presentation Type: Paper - Regular
Presentation Time: 04:06 PM - 04:18 PM

Abstract Keywords: Clinical Decision Support, Machine Learning, Quantitative Methods
Programmatic Theme: Clinical Informatics

Clinical prediction models for sepsis exhibit variable performance and have not reliably improved care. Although multimodal data sources may improve predictive performance for such models, their associated impact on clinical care, even for models with good operating characteristics, is typically unknown. Using a probabilistic decision analysis framework, we quantified the expected improvements and associated uncertainty in sepsis-related mortality associated with the performance of a prediction model. We found that, on average, sepsis prediction models, even those with poor performance, are likely to improve mortality, and that benefits are likely highest in the intensive care unit. However, these findings are highly sensitive to estimates of treatment benefits and harms that likely vary by local population characteristics and practice patterns. This framework is useful to guide scientific and funding strategies for future sepsis models, and also for other settings where the effect of treatment guided by a predicted outcome is unknown.

Speaker(s):
GARY WEISSMAN, MD, MSHP
University of Pennsylvania

Author(s):
GARY WEISSMAN, MD, MSHP - University of Pennsylvania; Rebecca Hubbard, PhD - Brown University School of Public Health;

Machine Learning for Pre-Culture ESBL Risk Stratification to Guide Empiric Antibiotic Selection: A 12-Hospital Study of Enterobacteriaceae Cultures

Presentation Type: Paper - Student
Presentation Time: 04:18 PM - 04:30 PM

Abstract Keywords: Machine Learning, Infectious Diseases and Epidemiology, Clinical Decision Support, Healthcare Quality, Knowledge Representation & Information Modeling, Public Health, Informatics Implementation, Population Health
Programmatic Theme: Clinical Research Informatics

Empiric antibiotic therapy for suspected ESBL-producing Enterobacteriaceae must be selected 48-72 hours before
culture results, forcing clinicians to choose between undertreating resistant infections and overusing carbapenems
that drive further resistance. We developed a cost-sensitive XGBoost model predicting phenotypic ESBL production at culture ordering using 45 pre-culture EHR features across 132,955 cultures from 12 hospitals (14.41% ESBL-positive). At 90% sensitivity, the model achieved 95.7% NPV, reducing post-test ESBL probability to 4.3%, a threshold that may support safe carbapenem-sparing in non-ICU settings, while avoiding 337 unnecessary broad-spectrum courses per 1,000 cultures. SHAP analysis identified prior ESBL colonization and neighborhood deprivation as dominant predictors; removing deprivation features caused minimal performance loss (∆AUROC = −0.027), enabling equitable bedside deployment. The model maintained temporal stability and generalized across organism strata.

Speaker(s):
Aravind Kuruvikkattil Venugopalan, Masters in Health Informatics
Student

Author(s):
Lalitha Pranathi Pulavarthy, MS Health Informatics - Indiana University Indianapolis; Rashmita Kudamala, Masters - Indiana University Indianapolis; Saptarshi Purkayastha, PhD - Indiana University, Luddy School of Informatics, Computing and Engineering;

Influence of AI Information on Complex Sepsis Treatment Decisions: A Randomized Experiment

Presentation Type: Podium Abstract
Presentation Time: 04:30 PM - 04:42 PM

Abstract Keywords: Human-computer Interaction, Clinical Decision Support, Artificial Intelligence, Quantitative Methods, User-centered Design Methods, Usability
Programmatic Theme: Clinical Informatics

Artificial intelligence (AI) could support sepsis treatment decisions, but its effects in complex cases are unclear. In this randomized experiment, 228 critical care physicians made decisions on six vignettes using three AI information designs. AI did not significantly affect decision quality, yet decisions shifted toward AI advice. Participants adapted and improved poor advice more easily when it was presented directly with fewer options. These results highlight the importance of tailoring AI information for human reasoning.

Speaker(s):
Venkatesh Sivaraman, PhD
UC San Francisco

Author(s):
Venkatesh Sivaraman, PhD - UC San Francisco; Joel Levin, PhD - UC San Diego; Eric Mason, MEng - Carnegie Mellon University; Victor Talisa, PhD - University of Pittsburgh; Suchi Saria, PhD - Johns Hopkins University; Jean Feng, PhD - UCSF; Andrew King, PhD, FAMIA - University of Pittsburgh; Jeremy Kahn, MD MS - University of Pittsburgh; Adam Perer - IBM Research;

Custom CSS

S06: Code Red on the Range: Sepsis, Speed, and Survival (Oral Presentations)

Unmasking Toxic Mimicry in Medical Offline Reinforcement Learning for ICU Sepsis Management via Counterfactual Clinical Audits

Category

Description

Custom CSS