Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Documentation Burden, Large Language Models (LLMs), Nursing Informatics
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI’s GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses’ clinical reasoning, and verification of LLM-based information summarization does not become burdensome for end-users.
Speaker(s):
Courtney Diamond, MA
Columbia University
Author(s):
Courtney Diamond, MA - Columbia University; Jennifer Thate, PhD, CNE, RN - Siena College; Rachel Lee, PhD, RN - Columbia University; Jennifer Withall, PhD - Columbia University Department of Biomedical Informatics; Kenrick Cato, PhD, RN, CPHIMS, FAAN - University of Pennsylvania/ Children's Hospital of Philadelphia; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics;
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Documentation Burden, Large Language Models (LLMs), Nursing Informatics
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI’s GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses’ clinical reasoning, and verification of LLM-based information summarization does not become burdensome for end-users.
Speaker(s):
Courtney Diamond, MA
Columbia University
Author(s):
Courtney Diamond, MA - Columbia University; Jennifer Thate, PhD, CNE, RN - Siena College; Rachel Lee, PhD, RN - Columbia University; Jennifer Withall, PhD - Columbia University Department of Biomedical Informatics; Kenrick Cato, PhD, RN, CPHIMS, FAAN - University of Pennsylvania/ Children's Hospital of Philadelphia; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics;
Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data
Category
Paper - Student
Description
Date: Monday (11/11)
Time: 03:30 PM to 03:45 PM
Room: Continental Ballroom 8-9
Time: 03:30 PM to 03:45 PM
Room: Continental Ballroom 8-9