- Home
- 2025 Annual Symposium Gallery
- Developing RxNorm Extension: A Step Toward Global Drug Data Harmonization in Observational Drug Research
Custom CSS
double-click to edit, do not edit in source
11/18/2025 |
8:00 AM – 9:15 AM |
Room 5
S58: The Drug Data Multiverse: Signals, Safety, and Sensitivity
Presentation Type: Oral Presentations
Revisiting Disproportionality: Prescription-Adjusted and TF-IDF-Inspired Metrics for Post-Market ADR Detection
Presentation Time: 08:00 AM - 08:12 AM
Abstract Keywords: Public Health, Personal Health Informatics, Data Mining
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Adverse drug reaction (ADR) detection in post-market surveillance is limited by underreporting and the absence of drug utilization data. This study proposes three signal detection metrics—including a TF-IDF-inspired method (EF-IDF) and two prescription-adjusted measures—to improve pharmacovigilance, using ADHD medications and the FDA Adverse Event Reporting System (FAERS) as a case study. We standardized drug and ADR entities, integrated prescription data from Bloomberg Intelligence, and evaluated performance across 12 ingredients using precision-at-10%. EF-IDF achieved the highest mean precision (0.56), significantly outperforming traditional PRR and prescription-based metrics. Correlation analysis showed that prescription volume negatively influenced all metrics, particularly EF-IDF, underscoring the role of contextual factors in ADR detection. Despite limitations in temporal granularity and the lack of prescription data specific to ADHD use, this work demonstrates the value of bias-aware, data-integrated methods for signal detection. Future directions include temporal modeling and more targeted identification of ADHD-related prescriptions using public data.
Speaker:
Heejun Kim, Ph.D. in Information and Library Science
University of North Texas
Authors:
Ijay Kaz-Onyeakazi, PhD - University of North Texas; Heejun Kim, Ph.D. in Information and Library Science - University of North Texas;
Presentation Time: 08:00 AM - 08:12 AM
Abstract Keywords: Public Health, Personal Health Informatics, Data Mining
Primary Track: Applications
Programmatic Theme: Public Health Informatics
Adverse drug reaction (ADR) detection in post-market surveillance is limited by underreporting and the absence of drug utilization data. This study proposes three signal detection metrics—including a TF-IDF-inspired method (EF-IDF) and two prescription-adjusted measures—to improve pharmacovigilance, using ADHD medications and the FDA Adverse Event Reporting System (FAERS) as a case study. We standardized drug and ADR entities, integrated prescription data from Bloomberg Intelligence, and evaluated performance across 12 ingredients using precision-at-10%. EF-IDF achieved the highest mean precision (0.56), significantly outperforming traditional PRR and prescription-based metrics. Correlation analysis showed that prescription volume negatively influenced all metrics, particularly EF-IDF, underscoring the role of contextual factors in ADR detection. Despite limitations in temporal granularity and the lack of prescription data specific to ADHD use, this work demonstrates the value of bias-aware, data-integrated methods for signal detection. Future directions include temporal modeling and more targeted identification of ADHD-related prescriptions using public data.
Speaker:
Heejun Kim, Ph.D. in Information and Library Science
University of North Texas
Authors:
Ijay Kaz-Onyeakazi, PhD - University of North Texas; Heejun Kim, Ph.D. in Information and Library Science - University of North Texas;
Heejun
Kim,
Ph.D. in Information and Library Science - University of North Texas
Developing RxNorm Extension: A Step Toward Global Drug Data Harmonization in Observational Drug Research
Presentation Time: 08:12 AM - 08:24 AM
Abstract Keywords: Interoperability and Health Information Exchange, Real-World Evidence Generation, Controlled Terminologies, Ontologies, and Vocabularies, Data transformation/ETL, Global Health, Data transformation/ETL
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
This paper presents RxNorm Extension (RxE), a standardized drug vocabulary system designed to harmonize drug data across international databases. RxE integrates drugs from multiple national drug repositories, enhancing global drug safety and effectiveness research by standardizing drug representations in disparate drug vocabularies to a structure following RxNorm, a reference standard in the US. We developed an attribute-based mapping approach that improves consistency and reduces manual data processing. Based on the 12 vocabularies included, we observe similar dose forms and ingredient usage patterns but many more brand names available worldwide. The quality of RxE depends on the quality assurance of the source vocabulary, where challenges include discrepancies in brand names, poorly structured dosage forms, and faulty dosages. Despite those, RxE has been used in numerous clinical and methodological studies. Future directions focus on expanding coverage, improving mapping automation, and fostering international collaboration to optimize global drug safety and effectiveness efforts.
Speaker:
Anna Ostropolets, PhD
Johnson and Johnson
Authors:
Aleh Zhuk, MD - Odysseus, an EPAM Company; Eduard Korchmar, BS - Odysseus, an EPAM Company; Patrick Ryan, PhD - Janssen Research and Development; Christian Reich, MD - OHDSI;
Presentation Time: 08:12 AM - 08:24 AM
Abstract Keywords: Interoperability and Health Information Exchange, Real-World Evidence Generation, Controlled Terminologies, Ontologies, and Vocabularies, Data transformation/ETL, Global Health, Data transformation/ETL
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
This paper presents RxNorm Extension (RxE), a standardized drug vocabulary system designed to harmonize drug data across international databases. RxE integrates drugs from multiple national drug repositories, enhancing global drug safety and effectiveness research by standardizing drug representations in disparate drug vocabularies to a structure following RxNorm, a reference standard in the US. We developed an attribute-based mapping approach that improves consistency and reduces manual data processing. Based on the 12 vocabularies included, we observe similar dose forms and ingredient usage patterns but many more brand names available worldwide. The quality of RxE depends on the quality assurance of the source vocabulary, where challenges include discrepancies in brand names, poorly structured dosage forms, and faulty dosages. Despite those, RxE has been used in numerous clinical and methodological studies. Future directions focus on expanding coverage, improving mapping automation, and fostering international collaboration to optimize global drug safety and effectiveness efforts.
Speaker:
Anna Ostropolets, PhD
Johnson and Johnson
Authors:
Aleh Zhuk, MD - Odysseus, an EPAM Company; Eduard Korchmar, BS - Odysseus, an EPAM Company; Patrick Ryan, PhD - Janssen Research and Development; Christian Reich, MD - OHDSI;
Anna
Ostropolets,
PhD - Johnson and Johnson
Leveraging Multi-Source Data to Resolve Inconsistency Across Pharmacogenomic Datasets in Drug Sensitivity Prediction
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Bioinformatics, Machine Learning, Artificial Intelligence
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Researchers have developed pharmacogenomics datasets for various purposes, such as biomarker identification, yet drug response prediction models often underperform due to dataset inconsistencies. These variations arise from inter-tumoral heterogeneity, experimental conditions, and cell subtype complexity, limiting model generalizability. To address this, we propose a computational model based on Aggregated Learning (AL) to enhance drug response prediction by learning from inconsistencies across multiple datasets. Our model minimizes discrepancies by training on overlapping inconsistent data points from three pharmacogenomic datasets—CCLE, GDSC2, and gCSI. Compared to four baseline methods—Selecting Better (SB), Result Average (RA), Combining Data (CD), and Model Average (MA)—our approach achieved superior performance with lower Mean Absolute Error (MAE) scores: 0.090 (CCLE-GDSC), 0.096 (CCLE-gCSI), and 0.081 (GDSC-gCSI). These results demonstrate that addressing inconsistencies enhances prediction accuracy and generalizability, making our model a promising solution for robust drug response predictions.
Speaker:
Xiaodi Li, Ph.D.
Mayo Clinic
Authors:
Xiaodi Li, Ph.D. - Mayo Clinic; Trisha Das, Ph.D. Student - University of Illinois Urbana-Champaign; Kritib Bhattarai, BS - Luther College; Sivaraman Rajaganapathy, Research Fellow/Ph.D. - Mayo Clinic; Vincent Buchner, BS - Luther College; Yanshan Wang, PhD - University of Pittsburgh; Chang Su, PhD - Weill Cornell Medicine; Lichao Sun, Ph.D. - Lehigh University; Liewei Wang, M.D., Ph.D. - Mayo Clinic; James Cerhan, M.D., Ph.D. - Mayo Clinic; Nansu Zong, Ph.D. - Mayo Clinic;
Presentation Time: 08:24 AM - 08:36 AM
Abstract Keywords: Bioinformatics, Machine Learning, Artificial Intelligence
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Researchers have developed pharmacogenomics datasets for various purposes, such as biomarker identification, yet drug response prediction models often underperform due to dataset inconsistencies. These variations arise from inter-tumoral heterogeneity, experimental conditions, and cell subtype complexity, limiting model generalizability. To address this, we propose a computational model based on Aggregated Learning (AL) to enhance drug response prediction by learning from inconsistencies across multiple datasets. Our model minimizes discrepancies by training on overlapping inconsistent data points from three pharmacogenomic datasets—CCLE, GDSC2, and gCSI. Compared to four baseline methods—Selecting Better (SB), Result Average (RA), Combining Data (CD), and Model Average (MA)—our approach achieved superior performance with lower Mean Absolute Error (MAE) scores: 0.090 (CCLE-GDSC), 0.096 (CCLE-gCSI), and 0.081 (GDSC-gCSI). These results demonstrate that addressing inconsistencies enhances prediction accuracy and generalizability, making our model a promising solution for robust drug response predictions.
Speaker:
Xiaodi Li, Ph.D.
Mayo Clinic
Authors:
Xiaodi Li, Ph.D. - Mayo Clinic; Trisha Das, Ph.D. Student - University of Illinois Urbana-Champaign; Kritib Bhattarai, BS - Luther College; Sivaraman Rajaganapathy, Research Fellow/Ph.D. - Mayo Clinic; Vincent Buchner, BS - Luther College; Yanshan Wang, PhD - University of Pittsburgh; Chang Su, PhD - Weill Cornell Medicine; Lichao Sun, Ph.D. - Lehigh University; Liewei Wang, M.D., Ph.D. - Mayo Clinic; James Cerhan, M.D., Ph.D. - Mayo Clinic; Nansu Zong, Ph.D. - Mayo Clinic;
Xiaodi
Li,
Ph.D. - Mayo Clinic
Predicting Chemotherapy-Related Symptom Deterioration Using Hybrid Deep Learning Architecture
Presentation Time: 08:36 AM - 08:48 AM
Abstract Keywords: Chronic Care Management, Artificial Intelligence, Deep Learning, Telemedicine
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Predicting symptom escalation in chemotherapy patients is essential for proactive intervention and improved clinical outcomes. This study leverages hybrid deep learning architectures, specifically Convolutional Neural Networks with Long Short-Term Memory (CNN-LSTM), to forecast the progression of 12 self-reported symptoms, categorized into physical (e.g., nausea, fatigue, pain) and mental (e.g., anxiety, cognitive impairment, mood changes) groups. The dataset consists of daily self-reported symptom logs from individuals undergoing chemotherapy. Given the high class imbalance—where 84% of cases showed no escalation—symptom data were aggregated into intervals of 3 to 7 days to improve predictive performance and temporal resolution. The CNN-LSTM model combines convolutional layers for extracting patterns within a local time window with LSTM layers for capturing long-term temporal dependencies. The model was trained using five-fold cross-validation to ensure robust generalization. Results indicate that 5-day intervals yielded the highest predictive accuracy for physical symptom prediction, with the CNN-LSTM model achieving an accuracy of 83%, precision of 89%, recall of 86%, F1-score of 88%, and an AUC of 83%. These findings highlight the effectiveness of hybrid deep learning architectures in symptom monitoring and early detection, enabling AI-driven decision support for real-time clinical interventions. Integrating these models into digital health systems could facilitate continuous symptom tracking, enhance predictive accuracy, and improve the quality of care for chemotherapy patients.
Speaker:
AREF SMILEY, Assistant Professor/PhD
The University of Utah
Authors:
Joseph Finkelstein, MD, PhD - University of Utah; AREF SMILEY, Assistant Professor/PhD - The University of Utah; Christina Echeverria, MSc - University of Utah; Kathi Mooney, PhD - University of Utah;
Presentation Time: 08:36 AM - 08:48 AM
Abstract Keywords: Chronic Care Management, Artificial Intelligence, Deep Learning, Telemedicine
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Predicting symptom escalation in chemotherapy patients is essential for proactive intervention and improved clinical outcomes. This study leverages hybrid deep learning architectures, specifically Convolutional Neural Networks with Long Short-Term Memory (CNN-LSTM), to forecast the progression of 12 self-reported symptoms, categorized into physical (e.g., nausea, fatigue, pain) and mental (e.g., anxiety, cognitive impairment, mood changes) groups. The dataset consists of daily self-reported symptom logs from individuals undergoing chemotherapy. Given the high class imbalance—where 84% of cases showed no escalation—symptom data were aggregated into intervals of 3 to 7 days to improve predictive performance and temporal resolution. The CNN-LSTM model combines convolutional layers for extracting patterns within a local time window with LSTM layers for capturing long-term temporal dependencies. The model was trained using five-fold cross-validation to ensure robust generalization. Results indicate that 5-day intervals yielded the highest predictive accuracy for physical symptom prediction, with the CNN-LSTM model achieving an accuracy of 83%, precision of 89%, recall of 86%, F1-score of 88%, and an AUC of 83%. These findings highlight the effectiveness of hybrid deep learning architectures in symptom monitoring and early detection, enabling AI-driven decision support for real-time clinical interventions. Integrating these models into digital health systems could facilitate continuous symptom tracking, enhance predictive accuracy, and improve the quality of care for chemotherapy patients.
Speaker:
AREF SMILEY, Assistant Professor/PhD
The University of Utah
Authors:
Joseph Finkelstein, MD, PhD - University of Utah; AREF SMILEY, Assistant Professor/PhD - The University of Utah; Christina Echeverria, MSc - University of Utah; Kathi Mooney, PhD - University of Utah;
AREF
SMILEY,
Assistant Professor/PhD - The University of Utah
PVLens: Enhancing Pharmacovigilance Through Automated Label Extraction
Presentation Time: 08:48 AM - 09:00 AM
Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Bioinformatics, Natural Language Processing, Data Standards, Patient Safety
Primary Track: Foundations
Programmatic Theme: Public Health Informatics
Reliable drug safety reference databases are essential for pharmacovigilance, yet existing resources like SIDER are
outdated and static. We introduce PVLens, an automated system that extracts labeled safety information from FDA
Structured Product Labels (SPLs) and maps terms to MedDRA. PVLens integrates automation with expert oversight
through a web-based review tool. In validation against 97 drug labels, PVLens achieved an F1 score of 0.882, with
high recall (0.983) and moderate precision (0.799). By offering a scalable, more accurate and continuously updated
alternative to SIDER, PVLens enhances real-time pharamcovigilance with improved accuracy and contemporaneous
insights.
Speaker:
Jeffery Painter, MS, JD
GSK
Authors:
Gregory Powell, PharmD, MBA - GlaxoSmithKline; Andrew Bate, PhD - GlaxoSmithKline;
Presentation Time: 08:48 AM - 09:00 AM
Abstract Keywords: Controlled Terminologies, Ontologies, and Vocabularies, Bioinformatics, Natural Language Processing, Data Standards, Patient Safety
Primary Track: Foundations
Programmatic Theme: Public Health Informatics
Reliable drug safety reference databases are essential for pharmacovigilance, yet existing resources like SIDER are
outdated and static. We introduce PVLens, an automated system that extracts labeled safety information from FDA
Structured Product Labels (SPLs) and maps terms to MedDRA. PVLens integrates automation with expert oversight
through a web-based review tool. In validation against 97 drug labels, PVLens achieved an F1 score of 0.882, with
high recall (0.983) and moderate precision (0.799). By offering a scalable, more accurate and continuously updated
alternative to SIDER, PVLens enhances real-time pharamcovigilance with improved accuracy and contemporaneous
insights.
Speaker:
Jeffery Painter, MS, JD
GSK
Authors:
Gregory Powell, PharmD, MBA - GlaxoSmithKline; Andrew Bate, PhD - GlaxoSmithKline;
Jeffery
Painter,
MS, JD - GSK
Penalized Regression Based Proteome-Wide Association Study Reveals Potential Population-Specific Drug Targets for Alzheimer's Disease in European and African American Cohorts
Presentation Time: 09:00 AM - 09:10 AM
Abstract Keywords: Bioinformatics, Drug Discoveries, Repurposing, and Side-effect, Machine Learning, Diversity, Equity, Inclusion, and Accessibility, Population Health, Racial disparities
Primary Track: Applications
Programmatic Theme: Translational Bioinformatics
Alzheimer’s disease (AD) is a complex neurodegenerative disorder with significant genetic underpinnings, yet effective treatments remain elusive. To bridge the gap between genetic discoveries and therapeutic development, we conducted a penalized regression based proteome-wide association study (PWAS) in both European and African American populations. Using publicly available GWAS summary statistics and the BLISS model, we identified 37 protein-coding genes significantly associated with AD risk, including APOE and BCAM in both populations. We further applied the GREP model to prioritize repositionable drugs targeting these genes, identifying 30 significant disease-target-drug pairs. Notably, Ramipril and BAY 85-8501 emerged as top candidates for AD treatment in European and African American populations, respectively. These findings highlight ancestry-specific drug targets, demonstrating the importance of diverse genetic studies in AD research and providing novel avenues for therapeutic intervention.
Speaker:
Shuo Shi, MS
Brandeis University
Authors:
Shuo Shi, MS - Brandeis University; Shijun Liu, MS - Carnegie Mellon University; You Liu, MS - University of Michigan; Quanchao Lu, MS - Georgia Institute of Technology;
Presentation Time: 09:00 AM - 09:10 AM
Abstract Keywords: Bioinformatics, Drug Discoveries, Repurposing, and Side-effect, Machine Learning, Diversity, Equity, Inclusion, and Accessibility, Population Health, Racial disparities
Primary Track: Applications
Programmatic Theme: Translational Bioinformatics
Alzheimer’s disease (AD) is a complex neurodegenerative disorder with significant genetic underpinnings, yet effective treatments remain elusive. To bridge the gap between genetic discoveries and therapeutic development, we conducted a penalized regression based proteome-wide association study (PWAS) in both European and African American populations. Using publicly available GWAS summary statistics and the BLISS model, we identified 37 protein-coding genes significantly associated with AD risk, including APOE and BCAM in both populations. We further applied the GREP model to prioritize repositionable drugs targeting these genes, identifying 30 significant disease-target-drug pairs. Notably, Ramipril and BAY 85-8501 emerged as top candidates for AD treatment in European and African American populations, respectively. These findings highlight ancestry-specific drug targets, demonstrating the importance of diverse genetic studies in AD research and providing novel avenues for therapeutic intervention.
Speaker:
Shuo Shi, MS
Brandeis University
Authors:
Shuo Shi, MS - Brandeis University; Shijun Liu, MS - Carnegie Mellon University; You Liu, MS - University of Michigan; Quanchao Lu, MS - Georgia Institute of Technology;
Shuo
Shi,
MS - Brandeis University
Developing RxNorm Extension: A Step Toward Global Drug Data Harmonization in Observational Drug Research
Category
Paper - Regular
Description
Custom CSS
double-click to edit, do not edit in source
11/18/2025 09:15 AM (Eastern Time (US & Canada))