Custom CSS
double-click to edit, do not edit in source
5/19/2026 |
2:00 PM – 3:15 PM |
Mt. Sopris B - Grand Hyatt Denver, Lobby Level
CI25: Guardrails, Governance, and Getting It Right (Oral Presentations)
Presentation Type: Oral Presentations
2026 CIC Health Equity Presentation
Session Credits: 1.25
Local PHI Scrubber: A Chrome Extension for On-Device PHI Redaction in Clinical LLM Workflows
Presentation Type: Oral Presentation - Student
Presentation Time: 02:00 PM - 02:12 PM
Abstract Keywords: Data Privacy, Cybersecurity, Reliability, and Security, Generative AI in Clinical Workflow: Ambient Listening, Chart Summarization, Automated Response with LLM, Human Factors and Usability, Clinician Well-Being, Workforce Automation, Communication, and Workflow Efficiency, Infrastructure and Cloud Computing
Primary Track: Implementing Real-World Change, Digital Engagement, and Connected Health
Clinicians are increasingly experimenting with large language models (LLMs) for literature search, documentation help, and clinical reasoning, but many of the most capable systems run in the browser and are not covered by a health system’s Business Associate Agreement. That creates a practical tension: clinicians want the benefit of tools like ChatGPT, Claude, and Gemini, yet Protected Health Information (PHI) must not leave HIPAA-aligned environments.
Local PHI Scrubber is an open-source Chrome extension that explores a third option. Instead of sending PHI to a cloud service, the extension connects to a locally hosted Microsoft Phi-3 Mini model via Ollama on the user’s own machine. When a clinician pastes clinical text into the popup and clicks “Scrub PHI,” the extension sends the text plus a Safe Harbor-style de-identification prompt to a localhost HTTP endpoint, receives a version with identifiers replaced by placeholders such as NAME, DATE, and MRN, and then displays the original and scrubbed text side by side for mandatory human review. A single click can then inject the scrubbed text into web-based LLM chat interfaces.
In this presentation I will describe the architecture, show live examples of success and failure cases, and discuss how the design foregrounds risk with warning-heavy UI, limited permissions, and no telemetry. Attendees will leave with a concrete, reproducible pattern for on-device PHI redaction that supports safer experimentation with external LLMs while keeping humans in the loop and PHI on local hardware.
Speaker(s):
James Weatherhead, MD-PhD Student
University of Texas Medical Branch (UTMB)
Author(s):
James Weatherhead, MD-PhD Student - University of Texas Medical Branch (UTMB);
Peter McCaffrey, MD - UTMB;
George Golovko, PhD - University of Texas Medical Branch;
James
Weatherhead,
MD-PhD Student - University of Texas Medical Branch (UTMB)
Designing a Post-Deployment AI Evaluation Pipeline: A Health System Approach to Streamlining Innovation while Maintaining Scientific Rigor, Scalability, and Sustainability
Presentation Type: Oral Presentation - Regular
Presentation Time: 02:12 PM - 02:24 PM
Abstract Keywords: Analytical Artificial Intelligence: ML, Digital Pathology, Imaging AI, Predictive Analytics, Governance, Change Management, Innovation Partnerships, Implementation Science, and Learning Health Systems
Primary Track: Implementing Real-World Change, Digital Engagement, and Connected Health
We share learnings from our continuous development of the Mayo Clinic AI Validation & Stewardship Program (AVSP) to establish a scalable, systematic post-deployment evaluation pipeline for AI models in production that monitors clinical AI performance, mitigates bias, and supports sustained clinical and business value. AVSP is designing a socio-technical post-deployment AI evaluation pipeline integrating monitoring, real-world performance auditing, and human-centered AI evaluation, supported by automated monitoring dashboards, clinician feedback loops, and periodic audits. Planned methods include adopting concepts from diagnostic testing, including sampling strategies for confirmatory testing of predicted negatives to enable continuous sensitivity and false omission rate (FOR) monitoring. Strategic stakeholder alignment has guided our multidisciplinary design to support enterprise-wide adoption and help ensure AI remains clinically relevant, operationally efficient, and trustworthy.
Speaker(s):
Shauna Overgaard, Ph.D.
Mayo Clinic
Author(s):
Young Juhn, M.D., M.P.H. - Mayo Clinic;
Chung Wi, MD - Mayo Clinic;
Momin Malik, PhD in Societal Computing - Mayo Clinic;
Deepak Sharma, MS - Mayo Clinic;
Shauna
Overgaard,
Ph.D. - Mayo Clinic
Current Approaches to Evaluating Large Language Model Outcomes for Healthcare Tasks: Implications for Research
Presentation Type: Oral Presentation - Regular
Presentation Time: 02:24 PM - 02:36 PM
Abstract Keywords: Health Data Science, Generative AI in Clinical Workflow: Ambient Listening, Chart Summarization, Automated Response with LLM, Generative AI in Clinical Workflow: Ambient Listening, Chart Summarization, Automated Response with LLM
Primary Track: Big Data for Health
Large language models (LLMs) can support care delivery, such as helping draft visit notes and supporting patient education. As LLMs are being considered for implementation into clinical practice, evaluations on various outcomes are needed across the multiple stages of implementation. After conducting a narrative review, we describe current evaluation approaches of LLM outcomes and shortcomings. We provide methodological considerations when planning a LLM evaluation to improve study rigor and explore understudied human factors outcomes.
Speaker(s):
Oliver Nguyen, MSHI
University of Wisconsin at Madison
Author(s):
Reshma Sahithi Dangeti, MS - Washington University in St. Louis;
Joanna Abraham, PhD, FACMI, FAMIA - Department of Anesthesiology and Institute for Informatics, Data Science and Biostatistics at Washington University in St. Louis, School of Medicine;
Oliver
Nguyen,
MSHI - University of Wisconsin at Madison
Resolving Clinical Ambiguity in ICU Notes: Policy-Aware Prompting Improves LLM Reasoning for Feeding Tube Status Classification
Presentation Type: Oral Presentation - Regular
Presentation Time: 02:36 PM - 02:48 PM
Abstract Keywords: Analytical Artificial Intelligence: ML, Digital Pathology, Imaging AI, Predictive Analytics, Governance, Generative AI in Clinical Workflow: Ambient Listening, Chart Summarization, Automated Response with LLM, Data Privacy, Cybersecurity, Reliability, and Security
Primary Track: Big Data for Health
Clinical narratives frequently contain ambiguous statements about temporality, purpose, and evidentiary certainty, making automated extraction of device status difficult and error-prone. Determining whether a feeding tube is actively used for nutrition is particularly challenging because documentation often includes planned procedures, temporary interruptions, or non-nutritional uses such as decompression. Large language models (LLMs) show promise for clinical information extraction, but they frequently over-infer, hallucinate, or inconsistently interpret underspecified text—especially in closed-network hospital environments where cloud-hosted APIs cannot be used.
We developed a policy-aware prompting framework designed to align LLM reasoning with clinical interpretive norms. Using Gemini-assisted retrieval and dual physician review, we constructed a gold-standard dataset of 395 ICU notes from MIMIC-III. Ambiguity was formalized into a taxonomy comprising temporal, purpose, and evidentiary categories, which were translated into explicit interpretive policies. Three locally served Gemini models (Flash-Lite, Flash, Pro) were evaluated across six prompting strategies within a secure, firewall-restricted environment.
Policy-aware prompting substantially improved multi-class performance, most notably in Gemini Flash, where macro F1 increased from 0.572 (vanilla) to 0.734. Correct classification of “Unclear” cases increased more than threefold, and run-to-run variance decreased five-fold, demonstrating enhanced reproducibility. These gains were achieved without external APIs, fine-tuning, or additional computational overhead.
Our findings show that clinical ambiguity is a structured, predictable phenomenon and that embedding explicit interpretive rules into prompts provides a lightweight yet powerful alignment mechanism for LLMs. Policy-aware prompting offers a practical, portable strategy for improving reasoning stability in ambiguity-heavy clinical NLP tasks and enables safe deployment within privacy-restricted healthcare environments.
Speaker(s):
Dukyong Yoon, MD, PhD
Yonsei University College of Medicine
Author(s):
Dukyong Yoon, MD, PhD - Yonsei University College of Medicine;
Ayesha Abeer, MBBS - Nishtar Medical University;
Vinayak Mathur, BE - JSPH International Boston Institute For Global Public Health INC;
Leo Anthony Celi, MD;
Dukyong
Yoon,
MD, PhD - Yonsei University College of Medicine
Can AI Augmentation Exist Without Eventual Replacement? A Framework for Understanding How Assistive Tools Become Substitutive.
Presentation Type: Oral Presentation - Regular
Presentation Time: 02:48 PM - 03:00 PM
Abstract Keywords: Ethics, Human Factors and Usability, Workforce Automation, Communication, and Workflow Efficiency
Primary Track: Advancing Wellness for Providers and Community with Consideration of Human Factors
Artificial intelligence tools are often described as augmenting clinicians, yet emerging evidence shows that augmentation can degrade skills and create dependence. Through a cross-disciplinary synthesis, we identify seven mechanisms of “augmentation drift” and introduce the nine-domain Augmentation Stability Evaluation Grid (ASEG) to assess drift risk. This framework helps informatics leaders anticipate when assistive AI may become substitutive and supports safer, more sustainable clinical AI deployment.
Speaker(s):
Tanner Dean, DO
Intermountain Health
Author(s):
John Symons, PhD - The University of Kansas;
Tanner
Dean,
DO - Intermountain Health