Custom CSS
double-click to edit, do not edit in source
5/19/2026 |
3:30 PM – 4:45 PM |
Mt. Elbert A - 555 Building, 2nd Floor
TRI22: Social, Environmental, and Structural Determinants (Oral Presentation)
Presentation Type: Oral Presentations
2026 CIC Health Equity Presentation
Session Credits: 1.25
Evaluating Linkage Approaches for Address-Level Socioenvironmental Exposure Assessment
Presentation Type: Paper - Regular
Presentation Time: 03:30 PM - 03:42 PM
Primary Track: Clinical Research Informatics
Accurate linkage of addresses to parcel-level data is essential for hyperlocal environmental exposure assessment, yet the performance of methods, including their impact on exposure misclassification and bias, remains poorly characterized. Using a gold standard match of 853,255 National Address Database records to authoritative datasets from Hamilton and Franklin Counties, Ohio, we evaluated address tag fuzzy matching and geocoding-based (geomatching) approaches on accuracy of linked parcel identifier and parcel market total value and usage type. Address tag fuzzy matching achieved 100% agreement; address point geomatching performed moderately well (65.1% - 76.1%), and street range geomatching performed poorly (7.2% - 59.2%). Poorer agreement was more common in neighborhoods with higher address densities and more community material deprivation, highlighting potential for differential misclassification of exposure assessment. These findings emphasize a need for precise, scalable, and standardized linkage approaches to support valid address- and parcel-level exposure assessment in clinical and population health research.
Speaker(s):
Carson Hartlage, BS
University of Cincinnati
Author(s):
Carson Hartlage, BS - University of Cincinnati;
Erika Manning, MS - Cincinnati Children's Hospital Medical Center;
Cole Brokamp, PhD;
Carson
Hartlage,
BS - University of Cincinnati
Augmenting Missing Individual Social Determinants of Health with Area-Level Social Deprivation Index for Gene–Environment Interaction: Application to Opioid Use Disorder Prediction
Presentation Type: Paper - Student
Student Paper Competition Nominee
Presentation Time: 03:42 PM - 03:54 PM
Primary Track: Translation Bioinformatics/Precision Medicine
Opioid use disorder (OUD) remains a major public health crisis, yet current predictive models often overlook the complex interplay between genetic risk and structural social environments. In this study, we present a multi-level, interpretable risk-stratification framework that integrates polygenic risk scores (PRS), individual-level clinical triggers, and community-level social determinants of health (SDoH) derived from ZIP-code socioeconomic indicators. Using data from the All of Us Research Program, we develop and validate additive and interaction-aware models to quantify how environmental deprivation amplifies or attenuates genetic susceptibility to OUD.
Our analysis reveals five consistent and clinically interpretable risk tiers: (1) proximal clinical triggers, (2) distal structural SDoH, (3) genetic liability, (4) gene–environment interactions (GxE), and (5) protective community anchors. Poverty significantly magnified genetic risk, while community education and income buffered PRS effects, demonstrating both diathesis–stress and resilience dynamics. We further evaluate model performance against black-box machine-learning baselines (XGBoost AUC ≈ 0.90), highlighting the trade-offs between interpretability and predictive performance in high-stakes clinical contexts.
This work provides a scalable informatics framework to operationalize GxE interactions in healthcare datasets and demonstrates how community-level deprivation indices can close major gaps caused by high missingness in individual survey-based SDoH. Our findings offer actionable pathways for risk stratification, policy planning, and equitable implementation of precision medicine for substance use disorders.
Speaker(s):
Yaxi Yang, Master of Science
Yale University
Author(s):
Jihoon Kim, PhD - Yale University;
Yaxi Yang, Master of Science - Yale University;
Youwen Liu, MS - Yale University;
Yaxi
Yang,
Master of Science - Yale University
Evaluating RAG and Non-RAG Pipelines for Concept Discovery in Environmental Health Ontologies
Presentation Type: Paper - Regular
Presentation Time: 03:54 PM - 04:06 PM
Primary Track: Data Science/Artificial Intelligence
The expansion of biomedical ontologies with relevant, high utility concepts remains a significant challenge in
biomedical knowledge representation, particularly for rapidly evolving fields like Environmental Determinants of
Health (EnDOH). In this work, we evaluate the effectiveness of using LLMs in support of ontology expansion,
comparing Retrieval-Augmented Generation (RAG) with non-RAG concept extraction from the medical literature.
Candidate concepts were generated across 15 targeted topics using category-specific prompts. The quality of
candidate concepts was assessed through semantic similarity to existing EnDOH concepts and sub-hierarchies. This
design enables both a comparative analysis of RAG versus non-RAG concept extraction approaches and the
identification of topic-level concept alignment with the ontology. Our results quantify the comparative strengths and
weaknesses of RAG vs non-RAG concept extraction and offer a replicable methodology for effectively extracting
potentially useful candidate concepts from the literature for the purpose of inclusion in biomedical ontologies.
Speaker(s):
Naren Khatwani, PhD Student
New Jersey Institute of Technology
Author(s):
Naren Khatwani, PhD Student - New Jersey Institute of Technology;
Navya Martin Kollapally, PhD in Computer Science - Kean University;
Lijing Wang, PhD - New Jersey Institute of Technology;
James Geller, PhD - NJIT;
Naren
Khatwani,
PhD Student - New Jersey Institute of Technology
Prompt-Tuned Open-Source Large Language Model for Structured Extraction of Social Determinants of Health from Clinical Notes
Presentation Type: Podium Abstract
Presentation Time: 04:06 PM - 04:18 PM
Primary Track: Data Science/Artificial Intelligence
Social determinants of health (SDoH) are critical predictors of health outcomes but are often documented only in free-text clinical notes, limiting their use in healthcare. Existing extraction approaches typically rely on institution-specific rules or fine-tuned transformer models, which can be difficult to update and often treat SDoH as flat labels with limited representation. To address these gaps, we developed a modular, prompt-tuned extraction pipeline using a single zero-shot prompt and a small open-source large language model (Llama 3.1-8B-Instruct) to identify seven SDoH factors: alcohol use, tobacco use, drug use, marital status, sleep, family support, and sexual activity and their associated attributes. Eighty clinical notes from the Indiana Network for Patient Care were annotated by two expert annotators (κ > 0.8), capturing binary presence and factor-specific attributes such as status, temporality, and key descriptors. The prompt was iteratively refined. No model fine-tuning was performed; all improvements were achieved through prompt updates. Across all factors, the model achieved macro-precision, macro-recall, and macro-F1 of 0.71, 0.93, and 0.79 at the factor level and 0.71, 0.87, and 0.77 at the attribute level. This work demonstrates that structured SDoH extraction with status and temporality can be accomplished using a lightweight, prompt-only approach, enabling rapid iteration and expansion without retraining. Future work will extend to additional SDoH domains and larger, multi-institutional datasets.
Speaker(s):
Hao Liu, PhD
Montclair State University
Author(s):
Cheok Long Tang, M.S. - Department of Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and Engineering;
Yu Huang, Ph.D. - Indiana University;
Jiang Bian, PhD - Indiana University/Regenstrief Institute;
Hao Liu, PhD - Montclair State University;
Yan Zhuang, Ph.D. - Indiana University;
Hao
Liu,
PhD - Montclair State University
Mapping the Storm: Linking Tornado Paths to Emergency Room Surges Through Geocoded Patient Data
Presentation Type: Paper - Student
Presentation Time: 04:18 PM - 04:30 PM
Primary Track: Clinical Research Informatics
Natural disasters, such as tornadoes, pose significant challenges to public health systems, often resulting in acute increases in emergency department utilization. This study examines healthcare utilization patterns in response to the May 16th St. Louis tornado and the use of geocoding in retrospective and predictive modeling of healthcare demand in the context of severe weather. The primary objectives are to define the tornado's trajectory and affected geographic areas, quantify changes in emergency room (ER) visit volumes during pre-disaster and post-disaster periods, and assess the spatial association between patient proximity to the tornado's path and the likelihood of ER utilization for specific health events. De-identified patient data were extracted from Washington University School of Medicine’s electronic health record system. HIPAA-compliant geocoding was used to convert patient addresses into latitude and longitude coordinates, enabling distance calculations from each patient to the tornado’s path.
Speaker(s):
Katherine Pyasik, working towards BS
Washington University in St. Louis
Author(s):
Katherine Pyasik, working towards BS - Washington University in St. Louis;
Katherine Bieber, BS - Saint Louis University School of Medicine;
Adam Wilcox, PhD - Washington University School of Medicine in St. Louis;
Katherine
Pyasik,
working towards BS - Washington University in St. Louis
LLM-Assisted Inductive Discovery of Emerging Concerns of Colorectal Cancer Survivors Beyond FACT-C Dimensions from Reddit Posts
Presentation Type: Podium Abstract
Presentation Time: 04:30 PM - 04:42 PM
Primary Track: Clinical Research Informatics
Large Language Models (LLMs) were employed to assist the inductive discovery of emerging concerns among Colorectal Cancer (CRC) survivors from 215,337 Reddit posts and comments collected between 2020 and 2024. The aim was to identify needs not covered by the existing dimensions of the Functional Assessment of Cancer Therapy–Colorectal (FACT-C) questionnaire. Using both deductive and inductive coding, various LLMs, including Llama-3 and GPT-4, were evaluated. Although performances varied, LLMs successfully identified meaningful emerging concerns. Four dominant inductive topics consistently surfaced across models: practical support from family, concern on treatment or diagnosis, cost, and caregiver mental health. This study provides a deeper understanding of CRC survivor perspectives and unmet needs missing in FACT-C
Speaker(s):
Liwei Wang, MD, PhD
UTHealth
Author(s):
Xiaomeng Wang, Master of Science - University of Texas Health Science Center at Houston;
Shuyu Lu, Master - University of Texas Health Science Center at Houston;
Yian Hu, MS - UTHealth Houston;
Nan Wang, Graduate Student - UTH;
Xin Li, Master of science - UTHealth Houston;
Stella Zhu, NA - East Chapel Hill High School, Chapel Hill, NC;
DIAN HU, PhD - University of Maryland School of Medicine;
Rui Li, Phd - UT health;
Heidi Dowst, MS - Baylor College of Medicine;
Hongfang Liu, PhD - University of Texas Health Science Center at Houston;
Liwei
Wang,
MD, PhD - UTHealth