A systematic evaluation of large language models for biomedical natural language processing: benchmarks, baselines, and recommendations
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Natural Language Processing, Large Language Models (LLMs), Deep Learning
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Despite the potential of Large Language Models in biomedicine, they lack baseline performance, benchmarks, and recommendations for using LLMs in the biomedical domain. This study makes three contributions. First, it undertakes a comprehensive evaluation to establish the baseline performance of LLMs (GPT-3.5, GPT-4, and LLaMA) across 12 BioNLP datasets encompassing six distinct extractive and generative tasks. Second, we conducted thorough manual validation collectively over thousands of sample outputs in total. Third, the study offers valuable suggestions for the effective use of LLMs in BioNLP applications.
Speaker(s):
Qingyu Chen, PhD
Yale University
Presentation Time: 04:45 PM - 05:00 PM
Abstract Keywords: Natural Language Processing, Large Language Models (LLMs), Deep Learning
Primary Track: Applications
Programmatic Theme: Clinical Informatics
Despite the potential of Large Language Models in biomedicine, they lack baseline performance, benchmarks, and recommendations for using LLMs in the biomedical domain. This study makes three contributions. First, it undertakes a comprehensive evaluation to establish the baseline performance of LLMs (GPT-3.5, GPT-4, and LLaMA) across 12 BioNLP datasets encompassing six distinct extractive and generative tasks. Second, we conducted thorough manual validation collectively over thousands of sample outputs in total. Third, the study offers valuable suggestions for the effective use of LLMs in BioNLP applications.
Speaker(s):
Qingyu Chen, PhD
Yale University
A systematic evaluation of large language models for biomedical natural language processing: benchmarks, baselines, and recommendations
Category
Podium Abstract