A General-Purpose Data Harmonization Framework: Supporting Reproducible and Scalable Data Integration in the RADx Data Hub
Presentation Time: 09:45 AM - 09:57 AM
Abstract Keywords: Interoperability and Health Information Exchange, Data Standards, Data transformation/ETL, Knowledge Representation and Information Modeling, Informatics Implementation, Data Sharing
Primary Track: Foundations
Programmatic Theme: Public Health Informatics
In the age of big data, it is important for primary research data to follow the FAIR principles of findability, accessibility, interoperability, and reusability. Data harmonization enhances interoperability and reusability by aligning heterogeneous data under standardized representations, benefiting both repository curators responsible for upholding data quality standards and consumers who require unified datasets. However, data harmonization is difficult in practice, requiring significant domain and technical expertise. We present a software framework to facilitate principled and reproducible harmonization protocols. Our framework implements a novel strategy of building harmonization transformations from parameterizable primitive operations, such as the assignment of numerical values to user-specified categories, with automated bookkeeping for executed transformations. We establish our data representation model and harmonization strategy and then report a proof-of-concept application in the context of the RADx Data Hub. Our framework enables data practitioners to execute transparent and reproducible harmonization protocols that align closely with their research goals.
Speaker(s):
Jimmy Yu, Ph.D.
Stanford University
Author(s):
Jimmy Yu, Ph.D. - Stanford University; Marcos Martinez-Romero, PhD - Stanford University; Matthew Horridge, PhD - Stanford University; Mete Akdogan, PhD - Stanford University; Mark Musen, MD, PhD - Stanford University;
Presentation Time: 09:45 AM - 09:57 AM
Abstract Keywords: Interoperability and Health Information Exchange, Data Standards, Data transformation/ETL, Knowledge Representation and Information Modeling, Informatics Implementation, Data Sharing
Primary Track: Foundations
Programmatic Theme: Public Health Informatics
In the age of big data, it is important for primary research data to follow the FAIR principles of findability, accessibility, interoperability, and reusability. Data harmonization enhances interoperability and reusability by aligning heterogeneous data under standardized representations, benefiting both repository curators responsible for upholding data quality standards and consumers who require unified datasets. However, data harmonization is difficult in practice, requiring significant domain and technical expertise. We present a software framework to facilitate principled and reproducible harmonization protocols. Our framework implements a novel strategy of building harmonization transformations from parameterizable primitive operations, such as the assignment of numerical values to user-specified categories, with automated bookkeeping for executed transformations. We establish our data representation model and harmonization strategy and then report a proof-of-concept application in the context of the RADx Data Hub. Our framework enables data practitioners to execute transparent and reproducible harmonization protocols that align closely with their research goals.
Speaker(s):
Jimmy Yu, Ph.D.
Stanford University
Author(s):
Jimmy Yu, Ph.D. - Stanford University; Marcos Martinez-Romero, PhD - Stanford University; Matthew Horridge, PhD - Stanford University; Mete Akdogan, PhD - Stanford University; Mark Musen, MD, PhD - Stanford University;
A General-Purpose Data Harmonization Framework: Supporting Reproducible and Scalable Data Integration in the RADx Data Hub
Category
Paper - Regular