Custom CSS
double-click to edit, do not edit in source
11/18/2025 |
9:45 AM – 11:00 AM |
Room 6
S69: Mission: Interoperable – Extract, Transform, Lead
Presentation Type: Oral Presentations
A General-Purpose Data Harmonization Framework: Supporting Reproducible and Scalable Data Integration in the RADx Data Hub
Presentation Time: 09:45 AM - 09:57 AM
Abstract Keywords: Interoperability and Health Information Exchange, Data Standards, Data transformation/ETL, Knowledge Representation and Information Modeling, Informatics Implementation, Data Sharing
Primary Track: Foundations
Programmatic Theme: Public Health Informatics
In the age of big data, it is important for primary research data to follow the FAIR principles of findability, accessibility, interoperability, and reusability. Data harmonization enhances interoperability and reusability by aligning heterogeneous data under standardized representations, benefiting both repository curators responsible for upholding data quality standards and consumers who require unified datasets. However, data harmonization is difficult in practice, requiring significant domain and technical expertise. We present a software framework to facilitate principled and reproducible harmonization protocols. Our framework implements a novel strategy of building harmonization transformations from parameterizable primitive operations, such as the assignment of numerical values to user-specified categories, with automated bookkeeping for executed transformations. We establish our data representation model and harmonization strategy and then report a proof-of-concept application in the context of the RADx Data Hub. Our framework enables data practitioners to execute transparent and reproducible harmonization protocols that align closely with their research goals.
Speaker:
Jimmy
Yu,
Ph.D.Stanford University
Authors:
Jimmy Yu, Ph.D. - Stanford University;
Marcos Martinez-Romero, PhD - Stanford University;
Matthew Horridge,
PhD -
Stanford University;
Mete Akdogan, PhD - Stanford University;
Mark Musen, MD, PhD - Stanford University;
Harmonizing Medicare Claims Data with OMOP: A Validated ETL Pipeline
Presentation Time: 09:57 AM - 10:09 AM
Abstract Keywords: Data Transformation/ETL, Interoperability and Health Information Exchange, Data Standards
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
This study presents a Python-based Extract, Transform, and Load (ETL) pipeline that converts Medicare Limited Data Set (LDS) claims into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). By mapping Medicare LDS tables to fifteen OMOP CDM tables, we achieved minimal data loss. Rigorous validation using the OMOP Data Quality Dashboard indicated a 99% pass rate across over 1,500 checks, affirming data fidelity. A comparative analysis showed high concordance in demographic traits and clinical conditions between the original and transformed datasets. Despite structural constraints and minor syntax errors leading to some unmapped codes, our approach preserves key administrative details and standardizes healthcare data for large-scale observational research. This scalable, reproducible pipeline addresses critical gaps in Medicare-LDS-to-OMOP conversion, improving data integration for diverse applications in health services research, population health, and policy analysis. Future expansions will incorporate additional clinical details and advanced concept mappings.
Speaker:
Yao An
Lee,
Master of ScienceUniversity of Florida, Department of Pharmaceutical Outcomes & Policy
Authors:
Serena Jingchuan Guo, MD, PhD - University of Florida;
Ying Lu,
MS -
University of Florida;
Xing He, Ph.D. - Indiana University;
Jiang Bian,
PhD -
Indiana University;
Leveraging Epic’s Native ETL Infrastructure for OMOP CDM Implementation: A Collaborative Experience
Presentation Time: 10:09 AM - 10:21 AM
Abstract Keywords: Data Transformation/ETL, Interoperability and Health Information Exchange, Data Sharing, Data Standards
Primary Track: Applications
Programmatic Theme: Clinical Informatics
The University of Texas Southwestern Medical Center (UTSW) and Texas Health Resources (THR) implemented an Observational Medical Outcomes Partnership (OMOP) common data model (CDM) that utilizes the Epic electronic health record’s (EHR) extract, transform, and load (ETL) system to enable collaborative research with other health institutions within the OHDSI network. We mapped EHR data from frequently utilized Epic Models to 25 OMOP CDM tables and transferred the data to a shared OMOP database housed within the Caboodle infrastructure using Epic’s pre-existing ETL system, minimizing the need for customization. ETL processes occur weekly at THR and daily at UTSW. OMOP CDM mapping resulted in data quality assessment values of 97% and 98% for THR and UTSW respectively. Our study established a reproduceable, collaborative pipeline using the OMOP CDM with Epic’s native ETL framework, expanding the OHDSI research network resulting in better quality and more generalizable data sets available for future research.
Speaker:
DuWayne
Willett,
MDUniversity of Texas Southwestern Medical Center
Authors:
Lauren Cooper, MS - University of Texas Southwestern Medical Center;
AAMIRAH VADSARIYA, RN MSN - UTSW;
Mereeja Varghese;
Bhavini Nayee,
BS -
University of Texas Southwestern Medical Center;
Jessica Moon,
BS -
University of Texas Southwestern Medical Center;
Chaitanya Katterapalli,
MS -
Texas Health Resources;
Clark Walker,
MPH -
Texas Health Resources;
Chris Gonzalez,
LPN -
Texas Health Resources;
Sonam Sohal,
MHSM MBA -
Texas Health Resources;
Christoph Lehmann, MD, FAAP, FACMI, FIAHSI - UT Southwestern;
Ferdinand Velasco, MD - Texas Health Resources;
Mujeeb Basit, MD, MMSc - UT Southwestern Medical Center;
DuWayne Willett, MD - University of Texas Southwestern Medical Center;
i2b2-to-OMOP Common Data Model Conversion Using EHR Data from a Nationwide Community Health Center Network
Presentation Time: 10:21 AM - 10:33 AM
Abstract Keywords: Data Transformation/ETL, Information Extraction, Interoperability and Health Information Exchange, Informatics Implementation
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics
The NIH-funded AIM-AHEAD consortium aims to increase the representativeness of AI/machine learning datasets. AIM-AHEAD Data Partner OCHIN Inc provides EHR records from a community health center network. To facilitate parallel analyses with other AIM-AHEAD datasets (e.g., All of Us), OCHIN is undergoing a OMOP Common Data Model conversion. Data from 9+ million patients and 290 million encounters have been mapped. Our efforts can inform future large-scale OMOP conversion efforts that use data from low-resource populations.
Speaker:
Taona
Haderlein,
PhDOCHIN, Inc
Authors:
Taona Haderlein, PhD - OCHIN, Inc;
Claudia Der-Martirosian,
PhD -
OCHIN, Inc;
Josh Lemieux, BA - OCHIN;
Robert Schuff, MS - OCHIN;
HL7 FHIR Molecular Definition Resource: Computable Representation of Discrete Clinical Genomic Data
Presentation Time: 10:33 AM - 10:45 AM
Abstract Keywords: Standards, Precision Medicine, Data Standards, Knowledge Representation and Information Modeling, Omics (genomics, metabolomics, proteomics, transcriptomics, etc.) and Integrative Analyses, Interoperability and Health Information Exchange
Primary Track: Foundations
Programmatic Theme: Translational Bioinformatics
Interoperable genomic data interoperability is a cornerstone of modern personalized medicine practice and research. The HL7 Clinical Genomics workgroup developed FHIR specifications to support reporting discrete genomic results. The Molecular Definition resource reflects robust domain semantics while providing computable data structures that facilitate clinical decision support and research. We describe the FHIR specification, including profiles (e.g., Sequence, Allele, and Variation), relevant code systems, examples, and guidance.
Speaker:
Robert
Freimuth,
PhDMayo Clinic
Authors:
Aly Khalifa, PhD - Mayo Clinic;
Xianfeng Chen, Ph.D - Mayo Clinic;
Dynamic Querying and Clinical Content Presentation through Agentic Frameworks and HL7 FHIR
Presentation Time: 10:45 AM - 10:57 AM
Abstract Keywords: Information Visualization, Large Language Models (LLMs), Information Retrieval, Documentation Burden
Primary Track: Foundations
Programmatic Theme: Clinical Informatics
This study used a multi-agent LLM approach to dynamically retrieve and present clinical information. Web-based interfaces were generated by the agents from five clinical tasks, with data being selectively retrieved and summarized through FHIR to meet task requirements. While tasks demanding simple retrieval performed well, more complex tasks and calculations highlighted areas for improvement. Findings suggest the potential for visual, integrated solutions that expand beyond text-based clinical content generation.
Speaker:
Robert
Barrett,
BSJohns Hopkins University
Authors:
Robert Barrett, BS - Johns Hopkins University;
Nicholas Dobbins, PhD, MLIS - Johns Hopkins University;