Bayesian Priors From Large Language Models Make Clinical Prediction Models More Interpretable
Presentation Time: 09:15 AM - 09:30 AM
Abstract Keywords: Machine Learning, Large Language Models (LLMs), Rule-based artificial intelligence, Natural Language Processing
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Training clinical machine learning (ML) models on thousands of features extracted from the Electronic Health Record (EHR) without any clinical curation can often lead to models that rely on spurious and clinically irrelevant features; on the other hand, because it is infeasible to have clinical experts review thousands of features, clinically curated features are often sparse and models trained using such features usually lack predictive power. We propose to leverage large language models (LLMs) to mimic the clinician’s input: we use an LLM to score the clinical relevance of EHR features and encode this information as a Bayesian prior for training a clinical ML model. In a case study training readmission risk prediction models, we show that this principled approach to integrating LLM-generated clinical priors results in models with high predictive power and far more interpretable feature sets.
Speaker(s):
Avni Kothari, MS
UCSF
Author(s):
Jean Feng, PhD; Lucas Zier, MD - UCSF; Seth Goldman, MD - UCSF; Daniel Bennett, MD - UCSF; Elizabeth Connelly, MPH - UCSF; James Marks, PhD - UCSF;
Presentation Time: 09:15 AM - 09:30 AM
Abstract Keywords: Machine Learning, Large Language Models (LLMs), Rule-based artificial intelligence, Natural Language Processing
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Training clinical machine learning (ML) models on thousands of features extracted from the Electronic Health Record (EHR) without any clinical curation can often lead to models that rely on spurious and clinically irrelevant features; on the other hand, because it is infeasible to have clinical experts review thousands of features, clinically curated features are often sparse and models trained using such features usually lack predictive power. We propose to leverage large language models (LLMs) to mimic the clinician’s input: we use an LLM to score the clinical relevance of EHR features and encode this information as a Bayesian prior for training a clinical ML model. In a case study training readmission risk prediction models, we show that this principled approach to integrating LLM-generated clinical priors results in models with high predictive power and far more interpretable feature sets.
Speaker(s):
Avni Kothari, MS
UCSF
Author(s):
Jean Feng, PhD; Lucas Zier, MD - UCSF; Seth Goldman, MD - UCSF; Daniel Bennett, MD - UCSF; Elizabeth Connelly, MPH - UCSF; James Marks, PhD - UCSF;
Bayesian Priors From Large Language Models Make Clinical Prediction Models More Interpretable
Category
Podium Abstract
Description
Date: Tuesday (11/12)
Time: 09:15 AM to 09:30 AM
Room: Continental Ballroom 8-9
Time: 09:15 AM to 09:30 AM
Room: Continental Ballroom 8-9