Synthetic Data Augmentation Enhance Disease Named Entity Recognition
Poster Number: P105
Presentation Time: 05:00 PM - 06:30 PM
Abstract Keywords: Information Extraction, Natural Language Processing, Large Language Models (LLMs), Machine Learning
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Synthetically generated labelled examples from language models have the potential address coverage gaps and imbalances in training corpora. We use Unified Medical Language System (UMLS) sourced prompts in conjunction with ChatGPT model to generate clinical mentions of diseases and show a small but significant improvement when including synthetic text.
Speaker(s):
John Osborne, PhD
University of Alabama at Birmingham
Author(s):
Kuleen Sasse;
Poster Number: P105
Presentation Time: 05:00 PM - 06:30 PM
Abstract Keywords: Information Extraction, Natural Language Processing, Large Language Models (LLMs), Machine Learning
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Synthetically generated labelled examples from language models have the potential address coverage gaps and imbalances in training corpora. We use Unified Medical Language System (UMLS) sourced prompts in conjunction with ChatGPT model to generate clinical mentions of diseases and show a small but significant improvement when including synthetic text.
Speaker(s):
John Osborne, PhD
University of Alabama at Birmingham
Author(s):
Kuleen Sasse;
Synthetic Data Augmentation Enhance Disease Named Entity Recognition
Category
Poster - Regular
Description
Date: Tuesday (11/12)
Time: 05:00 PM to 06:30 PM
Room: Grand Ballroom (Posters)
Time: 05:00 PM to 06:30 PM
Room: Grand Ballroom (Posters)