[Skip to Content]
Join AMIA
Menu
  • Register
  • Program Schedule
  • Speaker Search
  • My Account
  • Home
  • 2024 Annual Symposium Gallery
  • Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-Form Medical Question Answering in Ophthalmology

Custom CSS

double-click to edit, do not edit in source

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-Form Medical Question Answering in Ophthalmology

Presentation Time: 09:15 AM - 09:30 AM

Abstract Keywords: Information Retrieval, Large Language Models (LLMs), Data Mining
Primary Track: Applications
Programmatic Theme: Consumer Health Informatics

Objectives:
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. This study develops a domain-specific Retrieval Augment Generation (RAG) approach and systematically evaluates response accuracy and completeness, evidence factuality, selection, and attribution.

Materials and Methods:
We conducted a case study on long-form question answering in ophthalmology. A RAG pipeline with ~70,000 ophthalmology-specific documents was developed. The study compared LLM responses with and without RAG on 100 consumer health questions with ten clinicians.

Results:
Without RAG, 45.3% of the 252 references were hallucinated and 34.1% were erroneous. With RAG, hallucinated references were reduced to 18.8%. However, only 62.5% of the documents by RAG were selected as the top references in the LLM response. In addition, RAG significantly improved evidence attribution from 1.85 to 2.49 (on a scale from 1 to 5), with slight decreases in accuracy and completeness.

Discussion:
LLMs exhibited prevalent hallucinated and erroneous evidence in the responses. RAG substantially reduced the proportion of such evidence but encountered challenges. The results highlight that (1) LLMs may not select documents by RAG, (2) LLMs may miss top-ranked documents by RAG, and (3) irrelevant documents by RAG downgrade the response accuracy and completeness, especially in challenging tasks.

Conclusion:
Despite their potential, LLMs in medicine require improved evidence factuality and relevance. Through a case investigation in long-form medical question answering, RAG demonstrated effectiveness but encountered challenges, highlighting the need for further development in domain-specific LLM and RAG techniques.

Speaker(s):
Qingyu Chen, PhD
Yale University

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-Form Medical Question Answering in Ophthalmology

Category

Podium Abstract

Description

Custom CSS

double-click to edit, do not edit in source



Date: Tuesday (11/12)
Time: 09:15 AM to 09:30 AM
Room: Franciscan B

Back to Speaker Gallery
11/12/2024 10:00 AM (Pacific Time (US & Canada))
Amia logo

Headquarters:
6218 Georgia Avenue NW, Suite #1
PMB 3077
Washington, DC 20011
Phone: 301.657.1291

© 2026 American Medical Informatics Association. All Rights Reserved.