American Medical Informatics Association - Comparative Ranking of Marginal Confounding Impact of Natural Language Processing-Derived Versus Structured Features in Pharmacoepidemiology

Comparative Ranking of Marginal Confounding Impact of Natural Language Processing-Derived Versus Structured Features in Pharmacoepidemiology

Presentation Time: 09:30 AM - 09:45 AM

Abstract Keywords: Causal Inference, Natural Language Processing, Real-World Evidence Generation
Primary Track: Foundations

Objective: To explore the ability of natural language processing (NLP) methods to identify confounder information beyond what can be identified using claims codes alone for pharmacoepidemiology. Methods: We developed a retrospective cohort for high vs low dose proton pump inhibitors from linked Medicare claims (2008-2017) and clinical data for patients with a history of peptic ulcer disease. Clinical notes authored one year prior to cohort entry were processed via three NLP tools: bag-of-n-grams, MTERMS, and clustered BERT sentence embeddings. Candidate features were ranked using Bross formula. Results: The top 100 consisted of structured (75%; 19 prespecified) versus NLP-derived (25% with all tools accounted for) features. Conclusions: Bross formula is a simple way to rank the marginal confounding impact of binary features on estimated causal effects. NLP (especially n-grams) contributed to finding large numbers of features that can supplement claims data and prespecified variables to help in providing additional confounder information.

Speaker(s):
Joseph Plasek, PhD
Mass General Brigham

Comparative Ranking of Marginal Confounding Impact of Natural Language Processing-Derived Versus Structured Features in Pharmacoepidemiology

Description

Date: Monday (11/11)
Time: 09:30 AM to 09:45 AM
Room: Franciscan A

Back to Speaker Gallery

Custom CSS

Comparative Ranking of Marginal Confounding Impact of Natural Language Processing-Derived Versus Structured Features in Pharmacoepidemiology

Category

Description

Custom CSS