American Medical Informatics Association - Evaluating Medical Knowledge in Large Language Models through Probing with the UMLS

Evaluating Medical Knowledge in Large Language Models through Probing with the UMLS

Presentation Time: 08:45 AM - 09:00 AM

Abstract Keywords: Natural Language Processing, Large Language Models (LLMs), Diagnostic Systems
Primary Track: Foundations
Programmatic Theme: Clinical Research Informatics

This study investigates the representation of medical knowledge in Large Language Models (LLMs) like ChatGPT and Llama-2, using the Unified Medical Language System (UMLS) as a benchmark. It introduces a novel probing method to assess LLMs' ability to predict medical concepts within UMLS-defined knowledge paths. The evaluation, involving a comparison with a baseline model using dice coefficients, reveals ChatGPT's superior performance in understanding and interpreting medical relationships, albeit with modest F-scores. These findings underscore the potential of UMLS as a resource for evaluating LLMs in medical contexts and highlight the challenges in leveraging LLMs for medical diagnosis, pointing towards the need for further refinement and tuning of LLMs for biomedical applications.

Speaker(s):
Majid Afshar, MD, MSCR
University of Wisconsin - Madison

Author(s):
Deepak Gupta; Yanjun Gao, PhD - University of Wisconsin Madison; Emma Croxford, PhD Student - University of Wisconsin Madison; Majid Afshar, MD, MSCR - University of Wisconsin - Madison; Dina Demner-Fushman, MD - National Library of Medicine;

Evaluating Medical Knowledge in Large Language Models through Probing with the UMLS

Description

Date: Tuesday (11/12)
Time: 08:45 AM to 09:00 AM
Room: Franciscan B

Back to Speaker Gallery

Custom CSS

Evaluating Medical Knowledge in Large Language Models through Probing with the UMLS

Category

Description

Custom CSS