Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics
We evaluated the performance of two large language models, GPT-3.5-turbo and GPT-4, in generating Emergency Department (ED) discharge summaries. Using 100 randomly selected ED encounters, we found that GPT-4 outperforms GPT-3.5-turbo, generating discharge summaries that were highly accurate but liable to hallucinations and clinical omissions. While our results are promising, further work is needed to better understand how to prevent LLM hallucinations and ensure all clinically relevant information is included before clinical deployment.
Speaker(s):
Christopher Williams, MB BChir
UCSF
Presentation Time: 11:00 AM - 11:15 AM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Informatics
We evaluated the performance of two large language models, GPT-3.5-turbo and GPT-4, in generating Emergency Department (ED) discharge summaries. Using 100 randomly selected ED encounters, we found that GPT-4 outperforms GPT-3.5-turbo, generating discharge summaries that were highly accurate but liable to hallucinations and clinical omissions. While our results are promising, further work is needed to better understand how to prevent LLM hallucinations and ensure all clinically relevant information is included before clinical deployment.
Speaker(s):
Christopher Williams, MB BChir
UCSF
Evaluating Large Language Models for Drafting Emergency Department Discharge Summaries
Category
Podium Abstract