American Medical Informatics Association - Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries

Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries

Presentation Time: 11:00 AM - 11:15 AM

Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics

We evaluated the performance of two large language models, GPT-3.5-turbo and GPT-4, in generating Emergency Department (ED) discharge summaries. Using 100 randomly selected ED encounters, we found that GPT-4 outperforms GPT-3.5-turbo, generating discharge summaries that were highly accurate but liable to hallucinations and clinical omissions. While our results are promising, further work is needed to better understand how to prevent LLM hallucinations and ensure all clinically relevant information is included before clinical deployment.

Speaker(s):
Christopher Williams, MB BChir
UCSF

Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries

Description

Date: Tuesday (11/12)
Time: 11:00 AM to 11:15 AM
Room: Franciscan A

Back to Speaker Gallery

Custom CSS

Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries

Category

Description

Custom CSS