Times are displayed in (UTC-08:00) Pacific Time (US & Canada) Change
11/11/2024 |
3:30 PM – 5:00 PM |
Continental Ballroom 8-9
S53: Utilization Data and Data Utilization - Auditory Audits, Listening to the Data
Presentation Type: Oral
Session Chair:
Julia Adler-Milstein, PhD - UCSF School of Medicine
Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Documentation Burden, Large Language Models (LLMs), Nursing Informatics
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI’s GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses’ clinical reasoning, and verification of LLM-based information summarization does not become burdensome for end-users.
Speaker(s):
Courtney Diamond, MA
Columbia University
Author(s):
Courtney Diamond, MA - Columbia University; Jennifer Thate, PhD, CNE, RN - Siena College; Rachel Lee, PhD, RN - Columbia University; Jennifer Withall, PhD - Columbia University Department of Biomedical Informatics; Kenrick Cato, PhD, RN, CPHIMS, FAAN - University of Pennsylvania/ Children's Hospital of Philadelphia; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics;
Presentation Time: 03:30 PM - 03:45 PM
Abstract Keywords: Documentation Burden, Large Language Models (LLMs), Nursing Informatics
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI’s GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses’ clinical reasoning, and verification of LLM-based information summarization does not become burdensome for end-users.
Speaker(s):
Courtney Diamond, MA
Columbia University
Author(s):
Courtney Diamond, MA - Columbia University; Jennifer Thate, PhD, CNE, RN - Siena College; Rachel Lee, PhD, RN - Columbia University; Jennifer Withall, PhD - Columbia University Department of Biomedical Informatics; Kenrick Cato, PhD, RN, CPHIMS, FAAN - University of Pennsylvania/ Children's Hospital of Philadelphia; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics;
Measuring Cognitive Effort using Tabular Language Models of EHR-based Audit Log Action Sequences
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Workflow, Human-computer Interaction, Large Language Models (LLMs), Patient Safety, Standards
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Widespread adoption of Electronic Health Records (EHRs) has improved clinical work but, at the same time, has increased clinicians’ cognitive effort and associated workload. Currently, there is no standardized way of measuring cognitive effort at scale. We developed a clinician-level metric, action entropy, that estimates the cognitive effort using a neural language model trained on EHR audit logs and validated this metric with known high cognitive effort scenarios.
Speaker(s):
Seunghwan Kim, MS
Washington University in St. Louis
Author(s):
Seunghwan Kim, MS - Washington University in St. Louis; Benjamin Warner, MS - Washington University in St. Louis; Daphne Lew, PhD, MPH - Washington University School of Medicine; Sunny Lou, MD, PhD - Washington University, St. Louis; Thomas Kannampallil, PhD - Washington University School of Medicine;
Presentation Time: 03:45 PM - 04:00 PM
Abstract Keywords: Workflow, Human-computer Interaction, Large Language Models (LLMs), Patient Safety, Standards
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
Widespread adoption of Electronic Health Records (EHRs) has improved clinical work but, at the same time, has increased clinicians’ cognitive effort and associated workload. Currently, there is no standardized way of measuring cognitive effort at scale. We developed a clinician-level metric, action entropy, that estimates the cognitive effort using a neural language model trained on EHR audit logs and validated this metric with known high cognitive effort scenarios.
Speaker(s):
Seunghwan Kim, MS
Washington University in St. Louis
Author(s):
Seunghwan Kim, MS - Washington University in St. Louis; Benjamin Warner, MS - Washington University in St. Louis; Daphne Lew, PhD, MPH - Washington University School of Medicine; Sunny Lou, MD, PhD - Washington University, St. Louis; Thomas Kannampallil, PhD - Washington University School of Medicine;
EHR Documentation Frequency Changes Across the COVID-19 Pandemic
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Documentation Burden, Nursing Informatics, Data Mining
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Documentation is a critical bridge between clinicians’ workflows and direct patient care and requires a delicate balance between maintaining regulatory requirements and prioritizing patient needs. The documentation relaxation policies during the COVID-19 pandemic provide an opportunity to explore nurses’ documentation practices and what factors contributed to their documentation decisions. We explored the documentation trends of 10 flowsheet measure groups based on the measure names’ clinical semantics grouped by large language models. The study’s time period included the COVID-19 pandemic and before and after the implementation of a documentation relaxation policy, Surge Documentation. We established a pipeline to build regression models and visualizations for the documentation frequency trend changes. Documentation rates increased during COVID-19 and were significantly reduced with the implementation of documentation relaxation policies. We identified workload acuity-related factors, such as unit-level order sum, as contributing to more documentation. In this study, we also demonstrated that nurses engaged their critical thinking to prioritize documentation based on their workload and patient acuity. Further documentation policies should support nursing critical thinking and expertise in prioritizing nursing activities and patient care.
Speaker(s):
Hao Fan, MD
Washington University School of Medicine in St Louis
Author(s):
Hao Fan, MD - Washington University School of Medicine in St Louis; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics; Rosie Mugoya, Bsn - Goldfarb School of Nursing and Washington University of St. Louis; Haomiao Jia, PhD - Columbia University Medical Center; Jennifer Thate, PhD, CNE, RN - Siena College; Amy Finnegan, PhD, MPA - Columbia University Medical Center; IntraHealth International; Albert Lai, PhD, FACMI, FAMIA - Washington University; Kenrick Cato, PhD, RN, CPHIMS, FAAN - University of Pennsylvania/ Children's Hospital of Philadelphia; Po-Yin Yen, PhD, RN, FAMIA, FAAN - Washington University in St. Louis;
Presentation Time: 04:00 PM - 04:15 PM
Abstract Keywords: Documentation Burden, Nursing Informatics, Data Mining
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Documentation is a critical bridge between clinicians’ workflows and direct patient care and requires a delicate balance between maintaining regulatory requirements and prioritizing patient needs. The documentation relaxation policies during the COVID-19 pandemic provide an opportunity to explore nurses’ documentation practices and what factors contributed to their documentation decisions. We explored the documentation trends of 10 flowsheet measure groups based on the measure names’ clinical semantics grouped by large language models. The study’s time period included the COVID-19 pandemic and before and after the implementation of a documentation relaxation policy, Surge Documentation. We established a pipeline to build regression models and visualizations for the documentation frequency trend changes. Documentation rates increased during COVID-19 and were significantly reduced with the implementation of documentation relaxation policies. We identified workload acuity-related factors, such as unit-level order sum, as contributing to more documentation. In this study, we also demonstrated that nurses engaged their critical thinking to prioritize documentation based on their workload and patient acuity. Further documentation policies should support nursing critical thinking and expertise in prioritizing nursing activities and patient care.
Speaker(s):
Hao Fan, MD
Washington University School of Medicine in St Louis
Author(s):
Hao Fan, MD - Washington University School of Medicine in St Louis; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics; Rosie Mugoya, Bsn - Goldfarb School of Nursing and Washington University of St. Louis; Haomiao Jia, PhD - Columbia University Medical Center; Jennifer Thate, PhD, CNE, RN - Siena College; Amy Finnegan, PhD, MPA - Columbia University Medical Center; IntraHealth International; Albert Lai, PhD, FACMI, FAMIA - Washington University; Kenrick Cato, PhD, RN, CPHIMS, FAAN - University of Pennsylvania/ Children's Hospital of Philadelphia; Po-Yin Yen, PhD, RN, FAMIA, FAAN - Washington University in St. Louis;
Predictors and Consequences of Primary Care Physicians’ Reductions in Clinical Effort: A Nationwide EHR Audit Log Study
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Internal Medicine or Medical Subspecialty, Workflow, Bioinformatics
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Many PCPs express an intent to reduce their clinical effort. Using national Epic Signal data representing 23,203 PCPs, we found that 9.1% of PCPs reduced their clinical effort from 2019-2022 (median 30% reduction in monthly visits). Greater baseline monthly days worked and greater EHR use on unscheduled days were positively associated with clinical effort reduction. PCPs who reduced their clinical effort saw older patients and had a 21.7% relative increase in their total EHR time per visit post-reduction.
Speaker(s):
Lisa Rotenstein, MD, MBA, MSc
UCSF
Author(s):
Gabe Weinreb, BA - Harvard Business School; A J Holmgren, PhD - University of California, San Francisco; Nate Apathy, PhD - University of Maryland; David Bates, MD - Brigham and Women's Hospital; Bruce Landon, MD, MBA - Harvard Medical School; Lisa Rotenstein, MD, MBA, MSc - UCSF;
Presentation Time: 04:15 PM - 04:30 PM
Abstract Keywords: Internal Medicine or Medical Subspecialty, Workflow, Bioinformatics
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Many PCPs express an intent to reduce their clinical effort. Using national Epic Signal data representing 23,203 PCPs, we found that 9.1% of PCPs reduced their clinical effort from 2019-2022 (median 30% reduction in monthly visits). Greater baseline monthly days worked and greater EHR use on unscheduled days were positively associated with clinical effort reduction. PCPs who reduced their clinical effort saw older patients and had a 21.7% relative increase in their total EHR time per visit post-reduction.
Speaker(s):
Lisa Rotenstein, MD, MBA, MSc
UCSF
Author(s):
Gabe Weinreb, BA - Harvard Business School; A J Holmgren, PhD - University of California, San Francisco; Nate Apathy, PhD - University of Maryland; David Bates, MD - Brigham and Women's Hospital; Bruce Landon, MD, MBA - Harvard Medical School; Lisa Rotenstein, MD, MBA, MSc - UCSF;
Using think-aloud protocol to identify cognitive events to generate data-driven scientific hypotheses by inexperienced clinical researchers
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Human-computer Interaction, Information Visualization, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
We conducted a data-driven hypothesis generation study with clinical researchers using VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control, e.g., SPSS, SAS, R). The participants analyzed the same datasets and developed hypotheses using a think-aloud verbal protocol. Their screen activities and audio were recorded, transcribed, and coded for cognitive events. We analyzed the recordings to identify the cognitive events (e.g., “Analyze data,” “Seek connection”) during hypothesis generation. The VIADS group exhibited the lowest mean number of cognitive events per hypothesis with the smallest standard deviation. The highest percentages of cognitive events in hypothesis generation were “Using analysis results” (30%) and “Seeking connections” (23%). The results suggest that VIADS may guide participants better than the control group. The results provide evidence to explain the shorter average time needed by the VIADS group to generate each hypothesis.
Speaker(s):
Xia Jing, MD, PhD
Clemson University
Author(s):
Xia Jing, MD, PhD - Clemson University; Brooke Draghi, BS - Clemson University; Mytchell Ernst, BS - Clemson University; Vimla Patel, PhD - New York Academy of Medicine; James Cimino, MD, FACMI, FACP, FAMIA - Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama at Birmingham; Jay Shubrook, DO - Touro University; Yuchun Zhou, PhD - Ohio University; Chang Liu, PhD - Ohio University; Sonsoles De Lacalle, MD, PhD - California State University Channel Islands;
Presentation Time: 04:30 PM - 04:45 PM
Abstract Keywords: Human-computer Interaction, Information Visualization, Controlled Terminologies, Ontologies, and Vocabularies
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
We conducted a data-driven hypothesis generation study with clinical researchers using VIADS (a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies) or other analytical tools (as control, e.g., SPSS, SAS, R). The participants analyzed the same datasets and developed hypotheses using a think-aloud verbal protocol. Their screen activities and audio were recorded, transcribed, and coded for cognitive events. We analyzed the recordings to identify the cognitive events (e.g., “Analyze data,” “Seek connection”) during hypothesis generation. The VIADS group exhibited the lowest mean number of cognitive events per hypothesis with the smallest standard deviation. The highest percentages of cognitive events in hypothesis generation were “Using analysis results” (30%) and “Seeking connections” (23%). The results suggest that VIADS may guide participants better than the control group. The results provide evidence to explain the shorter average time needed by the VIADS group to generate each hypothesis.
Speaker(s):
Xia Jing, MD, PhD
Clemson University
Author(s):
Xia Jing, MD, PhD - Clemson University; Brooke Draghi, BS - Clemson University; Mytchell Ernst, BS - Clemson University; Vimla Patel, PhD - New York Academy of Medicine; James Cimino, MD, FACMI, FACP, FAMIA - Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama at Birmingham; Jay Shubrook, DO - Touro University; Yuchun Zhou, PhD - Ohio University; Chang Liu, PhD - Ohio University; Sonsoles De Lacalle, MD, PhD - California State University Channel Islands;
Optimizing Large Language Models for Discharge Prediction: Best Practices in Leveraging Electronic Health Record Audit Logs
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Large Language Models (LLMs), Machine Learning, Knowledge Representation and Information Modeling
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Electronic Health Record (EHR) audit logs are increasingly utilized for clinical tasks, from workflow modeling to predictive analyses of discharge events, adverse kidney outcomes, and hospital readmissions. These logs encapsulate user-EHR interactions, reflecting both healthcare professionals' behavior and patients' health statuses. To harness this temporal information effectively, this study explores the application of Large Language Models (LLMs) in leveraging audit log data for clinical prediction tasks, specifically focusing on discharge predictions. Utilizing a year's worth of EHR data from Vanderbilt University Medical Center, we fine-tuned LLMs with randomly selected 10,000 training examples. Our findings reveal that LLaMA-2 70B, with an AUROC of 0.80 [0.77-0.82], outperforms both GPT-4 128K in a zero-shot, with an AUROC of 0.68 [0.65-0.71], and DeBERTa, with an AUROC of 0.78 [0.75-0.82]. Among various serialization methods, the first-occurrence approach - wherein only the initial appearance of each event in a sequence is retained - showed superior performance. Furthermore, for the fine-tuned LLaMA-2 70B, logit outputs yielded a higher AUROC of 0.80 [0.77-0.82] compared to text outputs, with an AUROC of 0.69 [0.67-0.72]. This study underscores the potential of fine-tuned LLMs, particularly when combined with strategic sequence serialization, in advancing clinical prediction tasks.
Speaker(s):
Xinmeng Zhang, BS
Vanderbilt University
Author(s):
Xinmeng Zhang, BS - Vanderbilt University; Chao Yan, PhD - Vanderbilt University Medical Center; Yuyang Yang - Northwestern University; Zhuohang Li, MS - Vanderbilt University; Yubo Feng, MS - Vanderbilt University; Bradley Malin, PhD - Vanderbilt University Medical Center; You Chen, PhD - Vanderbilt University;
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Large Language Models (LLMs), Machine Learning, Knowledge Representation and Information Modeling
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Electronic Health Record (EHR) audit logs are increasingly utilized for clinical tasks, from workflow modeling to predictive analyses of discharge events, adverse kidney outcomes, and hospital readmissions. These logs encapsulate user-EHR interactions, reflecting both healthcare professionals' behavior and patients' health statuses. To harness this temporal information effectively, this study explores the application of Large Language Models (LLMs) in leveraging audit log data for clinical prediction tasks, specifically focusing on discharge predictions. Utilizing a year's worth of EHR data from Vanderbilt University Medical Center, we fine-tuned LLMs with randomly selected 10,000 training examples. Our findings reveal that LLaMA-2 70B, with an AUROC of 0.80 [0.77-0.82], outperforms both GPT-4 128K in a zero-shot, with an AUROC of 0.68 [0.65-0.71], and DeBERTa, with an AUROC of 0.78 [0.75-0.82]. Among various serialization methods, the first-occurrence approach - wherein only the initial appearance of each event in a sequence is retained - showed superior performance. Furthermore, for the fine-tuned LLaMA-2 70B, logit outputs yielded a higher AUROC of 0.80 [0.77-0.82] compared to text outputs, with an AUROC of 0.69 [0.67-0.72]. This study underscores the potential of fine-tuned LLMs, particularly when combined with strategic sequence serialization, in advancing clinical prediction tasks.
Speaker(s):
Xinmeng Zhang, BS
Vanderbilt University
Author(s):
Xinmeng Zhang, BS - Vanderbilt University; Chao Yan, PhD - Vanderbilt University Medical Center; Yuyang Yang - Northwestern University; Zhuohang Li, MS - Vanderbilt University; Yubo Feng, MS - Vanderbilt University; Bradley Malin, PhD - Vanderbilt University Medical Center; You Chen, PhD - Vanderbilt University;
S53: Utilization Data and Data Utilization - Auditory Audits, Listening to the Data
Description
Date: Monday (11/11)
Time: 3:30 PM to 5:00 PM
Room: Continental Ballroom 8-9
Time: 3:30 PM to 5:00 PM
Room: Continental Ballroom 8-9