- Home
- 2025 Annual Symposium Program Gallery
- S61: Prompt and Circumstance: Contextual Intelligence in the Age of LLMs
Times are displayed in (UTC-04:00) Eastern Time (US & Canada) Change
Custom CSS
double-click to edit, do not edit in source
11/18/2025 |
8:00 AM – 9:15 AM |
Room 8
S61: Prompt and Circumstance: Contextual Intelligence in the Age of LLMs
Presentation Type: Oral Presentations
Using LLMs to Interpret Arterial Blood Gases: Comparison of a Novel Math Scratchpad with Different Prompting Methods in a Three-Arm Trial
Presentation Time: 08:00 AM - 08:12 AM
Abstract Keywords: Artificial Intelligence, Large Language Models (LLMs), Clinical Decision Support, Diagnostic Systems
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Large language models (LLM) have demonstrated proficiency in various tasks, yet their effectiveness in clinical decision support system (CDSS) is evolving. One challenge is their limited ability to perform calculations. This study evaluates a novel method of using a custom math scratchpad to perform domain-specific calculations in interpreting Arterial Blood Gases (ABGs). Three methods are compared in a three-arm trial: zero-shot prompting (Method 1), prompt engineering with Retrieval-Augmented-Generation (RAG) (Method 2), and a combined novel math scratchpad, RAG, and prompt engineering (Method 3). The LLM-powered CDSS achieved accuracy rate of 86% (43/50) [confidence interval (CI) 73.81%-93.05%] across a database of 50 ABG results when utilizing Method 3, compared to 78% accuracy (39/50)[CI 64.76%-87.25%] for Method 2 and 48% (24/50) accuracy [CI 34.8% -61.49%] for Method 1. The evaluation demonstrates a math scratchpad’s utility in ABG interpretation by overcoming LLMs’ calculation limitation. Further testing with real-world patient ABG data is needed.
Speaker:Presentation Time: 08:00 AM - 08:12 AM
Abstract Keywords: Artificial Intelligence, Large Language Models (LLMs), Clinical Decision Support, Diagnostic Systems
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Large language models (LLM) have demonstrated proficiency in various tasks, yet their effectiveness in clinical decision support system (CDSS) is evolving. One challenge is their limited ability to perform calculations. This study evaluates a novel method of using a custom math scratchpad to perform domain-specific calculations in interpreting Arterial Blood Gases (ABGs). Three methods are compared in a three-arm trial: zero-shot prompting (Method 1), prompt engineering with Retrieval-Augmented-Generation (RAG) (Method 2), and a combined novel math scratchpad, RAG, and prompt engineering (Method 3). The LLM-powered CDSS achieved accuracy rate of 86% (43/50) [confidence interval (CI) 73.81%-93.05%] across a database of 50 ABG results when utilizing Method 3, compared to 78% accuracy (39/50)[CI 64.76%-87.25%] for Method 2 and 48% (24/50) accuracy [CI 34.8% -61.49%] for Method 1. The evaluation demonstrates a math scratchpad’s utility in ABG interpretation by overcoming LLMs’ calculation limitation. Further testing with real-world patient ABG data is needed.
Praveen Meka, MD
Dana Farber Cancer Institute
Authors:
Christine Silvers, MD, PhD - Amazon Web Services; Qing Liu, BE - Amazon Web Services; Bharath Gunapati, Sr. Solutions Architect - Amazon Web Services;
Christine Silvers, MD, PhD - Amazon Web Services; Qing Liu, BE - Amazon Web Services; Bharath Gunapati, Sr. Solutions Architect - Amazon Web Services;
Cultural Prompting Improves the Empathy and Cultural Responsiveness of GPT-Generated Therapy Responses
Presentation Time: 08:12 AM - 08:24 AM
Abstract Keywords: Patient Engagement and Preferences, Nursing Informatics, Health Equity, Fairness and elimination of bias, Artificial Intelligence, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Consumer Health Informatics
Large Language Model (LLM)-based conversational agents offer promising solutions for mental health support, but lack cultural responsiveness for diverse populations. This study evaluated the effectiveness of cultural prompting in improving cultural responsiveness and perceived empathy of LLM-generated therapeutic responses for Chinese American family caregivers. Using a randomized controlled experiment, we compared GPT-4o and Deepseek-V3 responses with and without cultural prompting. Thirty-six participants evaluated input-response pairs on cultural responsiveness (competence and relevance) and perceived empathy. Results showed that cultural prompting significantly enhanced GPT-4o’s performance across all dimensions, with GPT-4o with cultural prompting being the most preferred, while improvements in DeepSeek-V3 responses were not significant. Mediation analysis revealed that cultural prompting improved empathy through improving cultural responsiveness. This study demonstrated that prompt-based techniques can effectively enhance the cultural responsiveness of LLM-generated therapeutic responses, highlighting the importance of cultural responsiveness in delivering empathetic AI-based therapeutic interventions to culturally and linguistically diverse populations.
Speaker:Presentation Time: 08:12 AM - 08:24 AM
Abstract Keywords: Patient Engagement and Preferences, Nursing Informatics, Health Equity, Fairness and elimination of bias, Artificial Intelligence, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Consumer Health Informatics
Large Language Model (LLM)-based conversational agents offer promising solutions for mental health support, but lack cultural responsiveness for diverse populations. This study evaluated the effectiveness of cultural prompting in improving cultural responsiveness and perceived empathy of LLM-generated therapeutic responses for Chinese American family caregivers. Using a randomized controlled experiment, we compared GPT-4o and Deepseek-V3 responses with and without cultural prompting. Thirty-six participants evaluated input-response pairs on cultural responsiveness (competence and relevance) and perceived empathy. Results showed that cultural prompting significantly enhanced GPT-4o’s performance across all dimensions, with GPT-4o with cultural prompting being the most preferred, while improvements in DeepSeek-V3 responses were not significant. Mediation analysis revealed that cultural prompting improved empathy through improving cultural responsiveness. This study demonstrated that prompt-based techniques can effectively enhance the cultural responsiveness of LLM-generated therapeutic responses, highlighting the importance of cultural responsiveness in delivering empathetic AI-based therapeutic interventions to culturally and linguistically diverse populations.
Serena Jinchen Xie, Masters
Biomedical Informatics and Medical Education, University of Washington
Authors:
Shumenghui Zhai, PHD, MPH - Pacific Lutheran University; Yanjing Liang, MS - University of Washington; Jingyi Li, PhD - University of Washington; Xuehong Fan, MS - University of Washington; Trevor Cohen, MBChB, PhD - Biomedical Informatics and Medical Education, University of Washington; Weichao Yuwen, PhD, RN - University of Washington Tacoma;
Shumenghui Zhai, PHD, MPH - Pacific Lutheran University; Yanjing Liang, MS - University of Washington; Jingyi Li, PhD - University of Washington; Xuehong Fan, MS - University of Washington; Trevor Cohen, MBChB, PhD - Biomedical Informatics and Medical Education, University of Washington; Weichao Yuwen, PhD, RN - University of Washington Tacoma;
Is Tree-of-Thought Prompting Strategy Better than Chain-of-Thought? Vaping Cessation Analysis Using Large Language Models
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Public Health, Large Language Models (LLMs), Information Extraction
Primary Track: Foundations
Vaping is gaining popularity among adolescents and poses risks to users. Social media platforms such as Reddit
provide insights into user behaviors regarding vaping. In previous studies, our team explored the ability of large
language models (LLMs) to perform binary classification at a sentence level to determine if LLMs can be used to
identify vaping cessation application users. Maintaining this goal, this study expands to compare OpenAI’s GPT-o1
and GPT-o3-mini, Google’s Gemini 2.0 Flash and Gemma 2, Meta’s LLAMA 3.3, Deepseek’s R1, and xAI’s Grok
2 against human annotators to identify which models best perform binary classification to identify quit vaping
intention and multiclass classification to detect quit stages. We tested these models with emerging chain-of-thought,
tree-of-thought and simple prompts to see which strategy performed best. To our knowledge, this is the first
investigation of tree-of-thought prompting. Our initial results indicate that tree-of-thought and chain-of-thought
prompting do not boost performance.
Speaker:Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Public Health, Large Language Models (LLMs), Information Extraction
Primary Track: Foundations
Vaping is gaining popularity among adolescents and poses risks to users. Social media platforms such as Reddit
provide insights into user behaviors regarding vaping. In previous studies, our team explored the ability of large
language models (LLMs) to perform binary classification at a sentence level to determine if LLMs can be used to
identify vaping cessation application users. Maintaining this goal, this study expands to compare OpenAI’s GPT-o1
and GPT-o3-mini, Google’s Gemini 2.0 Flash and Gemma 2, Meta’s LLAMA 3.3, Deepseek’s R1, and xAI’s Grok
2 against human annotators to identify which models best perform binary classification to identify quit vaping
intention and multiclass classification to detect quit stages. We tested these models with emerging chain-of-thought,
tree-of-thought and simple prompts to see which strategy performed best. To our knowledge, this is the first
investigation of tree-of-thought prompting. Our initial results indicate that tree-of-thought and chain-of-thought
prompting do not boost performance.
Lucas Aust, Student
University of South Carolina
Authors:
Lucas Aust, Undergraduate Student - Hi3 Tech Lab; Ming Huang, PhD - UTHealth Houston; Anthony Fu, Student - Hi3 Tech Lab;
Lucas Aust, Undergraduate Student - Hi3 Tech Lab; Ming Huang, PhD - UTHealth Houston; Anthony Fu, Student - Hi3 Tech Lab;
Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning
Presentation Time: 08:36 AM - 08:48 AM
Abstract Keywords: Large Language Models (LLMs), Mobile Health, Chronic Care Management
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Family caregivers often face substantial mental health challenges due to their multifaceted roles and limited resources. This study explored the potential of a large language model (LLM)-powered conversational agent to deliver evidence-based mental health support for caregivers, specifically Problem-Solving Therapy (PST) integrated with Motivational Interviewing (MI) and Behavioral Chain Analysis (BCA). A within-subject experiment was conducted with 28 caregivers interacting with four LLM configurations to evaluate empathy and therapeutic alliance. The best-performing models incorporated Few-Shot and Retrieval-Augmented Generation (RAG) prompting techniques, alongside clinician-curated examples. The models showed improved contextual understanding and personalized support, as reflected by qualitative responses and quantitative ratings on perceived empathy and therapeutic alliances. Participants valued the model’s ability to validate emotions, explore unexpressed feelings, and provide actionable strategies. However, balancing thorough assessment with efficient advice delivery remains a challenge. This work highlights the potential of LLMs in delivering empathetic and tailored support for family caregivers.
Speaker:Presentation Time: 08:36 AM - 08:48 AM
Abstract Keywords: Large Language Models (LLMs), Mobile Health, Chronic Care Management
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Family caregivers often face substantial mental health challenges due to their multifaceted roles and limited resources. This study explored the potential of a large language model (LLM)-powered conversational agent to deliver evidence-based mental health support for caregivers, specifically Problem-Solving Therapy (PST) integrated with Motivational Interviewing (MI) and Behavioral Chain Analysis (BCA). A within-subject experiment was conducted with 28 caregivers interacting with four LLM configurations to evaluate empathy and therapeutic alliance. The best-performing models incorporated Few-Shot and Retrieval-Augmented Generation (RAG) prompting techniques, alongside clinician-curated examples. The models showed improved contextual understanding and personalized support, as reflected by qualitative responses and quantitative ratings on perceived empathy and therapeutic alliances. Participants valued the model’s ability to validate emotions, explore unexpressed feelings, and provide actionable strategies. However, balancing thorough assessment with efficient advice delivery remains a challenge. This work highlights the potential of LLMs in delivering empathetic and tailored support for family caregivers.
Liying Wang, PhD
Florida State University
Authors:
Liying Wang, PhD - Florida State University; Daffodil Carrington, MS in Clinical Informatics and Patient Centered Technologies - University of Washington Seattle; Daniil Filienko, BS in Computer Science and Systems - University of Washington Tacoma; Caroline Jazmi, MS Artificial Intelligence - University of Washington, Seattle; Serena Jinchen Xie, Masters - Biomedical Informatics and Medical Education, University of Washington; Martine De Cock, PhD in Computer Science - University of Washington, Tacoma; Sarah Iribarren, PhD - University of Washington; Weichao Yuwen, PhD, RN - University of Washington Tacoma;
Liying Wang, PhD - Florida State University; Daffodil Carrington, MS in Clinical Informatics and Patient Centered Technologies - University of Washington Seattle; Daniil Filienko, BS in Computer Science and Systems - University of Washington Tacoma; Caroline Jazmi, MS Artificial Intelligence - University of Washington, Seattle; Serena Jinchen Xie, Masters - Biomedical Informatics and Medical Education, University of Washington; Martine De Cock, PhD in Computer Science - University of Washington, Tacoma; Sarah Iribarren, PhD - University of Washington; Weichao Yuwen, PhD, RN - University of Washington Tacoma;
Automating Lung-RADS Categorization and Follow-Up Recommendations Using In-Context Learning with Large Language Models
Presentation Time: 08:48 AM - 09:00 AM
Abstract Keywords: Bioinformatics, Artificial Intelligence, Large Language Models (LLMs), Clinical Decision Support, Clinical Guidelines, Information Extraction, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Lung cancer remains a significant challenge in public health, ranking among the leading causes of cancer-related mortality. Low-dose computed tomography (LDCT)--based lung cancer screening has emerged as an effective tool for early detection, particularly in high-risk populations. However, interpreting lung nodule characteristics from radiology reports can often be time-consuming and labor-intensive due to the length and inherent ambiguity of the reports, even with standardized reporting requirements like Lung-RADS. Generating Lung-RADS assessments from original radiology reports is a significant task for radiologists. This study addresses these challenges by developing an in-context learning framework utilizing large language models (LLMs). In this process, we aimed to identify the best approach that accurately categorizes lung nodules and streamlines management decisions, providing robust and interpretable decision support. Overall, this research aims to reduce the time and effort of the radiologist in lung cancer screening, ultimately enhancing efficiency and accuracy and enabling timely and precise interventions.
Speaker:Presentation Time: 08:48 AM - 09:00 AM
Abstract Keywords: Bioinformatics, Artificial Intelligence, Large Language Models (LLMs), Clinical Decision Support, Clinical Guidelines, Information Extraction, Natural Language Processing
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Lung cancer remains a significant challenge in public health, ranking among the leading causes of cancer-related mortality. Low-dose computed tomography (LDCT)--based lung cancer screening has emerged as an effective tool for early detection, particularly in high-risk populations. However, interpreting lung nodule characteristics from radiology reports can often be time-consuming and labor-intensive due to the length and inherent ambiguity of the reports, even with standardized reporting requirements like Lung-RADS. Generating Lung-RADS assessments from original radiology reports is a significant task for radiologists. This study addresses these challenges by developing an in-context learning framework utilizing large language models (LLMs). In this process, we aimed to identify the best approach that accurately categorizes lung nodules and streamlines management decisions, providing robust and interpretable decision support. Overall, this research aims to reduce the time and effort of the radiologist in lung cancer screening, ultimately enhancing efficiency and accuracy and enabling timely and precise interventions.
Tiancheng Zhou, M.S
University of Florida
Authors:
Tiancheng Zhou, M.S - University of Florida; Aokun Chen, PhD - University of Florida; Yu Hu, M.S - University of Florida; Xiwei Lou, M.S - University of Florida; Xing He, Ph.D. - Indiana University; Yu Huang, Ph.D. - Indiana University; Bruno Hochhegger, MD, Ph.D. - University of Florida; Hiren Mehta, MD - University of Florida; Mattia Prosperi, PhD, FAMIA - University of Florida; Jiang Bian, Ph.D. - Indiana University;
Tiancheng Zhou, M.S - University of Florida; Aokun Chen, PhD - University of Florida; Yu Hu, M.S - University of Florida; Xiwei Lou, M.S - University of Florida; Xing He, Ph.D. - Indiana University; Yu Huang, Ph.D. - Indiana University; Bruno Hochhegger, MD, Ph.D. - University of Florida; Hiren Mehta, MD - University of Florida; Mattia Prosperi, PhD, FAMIA - University of Florida; Jiang Bian, Ph.D. - Indiana University;
Shifting Information Needs in Clinical Practice: The Evolving Role of Generative AI in Addressing Clinician Demands for Context-Specific Knowledge
Presentation Time: 09:00 AM - 09:12 AM
Abstract Keywords: Clinical Decision Support, Surveys and Needs Analysis, Large Language Models (LLMs)
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
This study explores clinicians’ evolving information needs and evaluates the potential of Generative Artificial Intelligence (Gen AI) to address these gaps by reassessing and extending the Currie et al. (2003) taxonomy. Despite advancements in electronic health records (EHRs), unresolved information needs persist, impacting clinical efficiency and patient care. A cross-sectional survey conducted at Columbia University Irving Medical Center (CUIMC) analyzed clinician-generated Gen AI prompts, comparing them against the 2003 taxonomy. Findings reveal that while 80% of prompts align with existing categories, 20% represent emerging needs, including AI-driven workflow optimization and fairness-related inquiries. These findings highlight the necessity of adapting clinical decision support frameworks to integrate AI-driven solutions, ensuring that modern tools meet evolving clinician needs. By formally extending the Currie et al. taxonomy, this study provides a foundational framework for leveraging Gen AI to bridge long-standing information gaps and enhance patient outcomes in an increasingly complex healthcare environment.
Speaker:Presentation Time: 09:00 AM - 09:12 AM
Abstract Keywords: Clinical Decision Support, Surveys and Needs Analysis, Large Language Models (LLMs)
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
This study explores clinicians’ evolving information needs and evaluates the potential of Generative Artificial Intelligence (Gen AI) to address these gaps by reassessing and extending the Currie et al. (2003) taxonomy. Despite advancements in electronic health records (EHRs), unresolved information needs persist, impacting clinical efficiency and patient care. A cross-sectional survey conducted at Columbia University Irving Medical Center (CUIMC) analyzed clinician-generated Gen AI prompts, comparing them against the 2003 taxonomy. Findings reveal that while 80% of prompts align with existing categories, 20% represent emerging needs, including AI-driven workflow optimization and fairness-related inquiries. These findings highlight the necessity of adapting clinical decision support frameworks to integrate AI-driven solutions, ensuring that modern tools meet evolving clinician needs. By formally extending the Currie et al. taxonomy, this study provides a foundational framework for leveraging Gen AI to bridge long-standing information gaps and enhance patient outcomes in an increasingly complex healthcare environment.
Sachleen Tuteja, BS in Data Science and Statistics
Northwestern University
Authors:
Sachleen Tuteja, BS in Data Science and Statistics - Northwestern University; Elise Boventer, MD, MPH - Northwell Health; Abdulaziz Alkattan, Clinical Informatics - NYP/Columbia; Noémie Elhadad, PhD - Columbia University; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics;
Sachleen Tuteja, BS in Data Science and Statistics - Northwestern University; Elise Boventer, MD, MPH - Northwell Health; Abdulaziz Alkattan, Clinical Informatics - NYP/Columbia; Noémie Elhadad, PhD - Columbia University; Sarah Rossetti, RN, PhD - Columbia University Department of Biomedical Informatics;
S61: Prompt and Circumstance: Contextual Intelligence in the Age of LLMs
Description
Custom CSS
double-click to edit, do not edit in source