Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-Form Medical Question Answering in Ophthalmology
Presentation Time: 09:15 AM - 09:30 AM
Abstract Keywords: Information Retrieval, Large Language Models (LLMs), Data Mining
Primary Track: Applications
Programmatic Theme: Consumer Health Informatics
Objectives:
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. This study develops a domain-specific Retrieval Augment Generation (RAG) approach and systematically evaluates response accuracy and completeness, evidence factuality, selection, and attribution.
Materials and Methods:
We conducted a case study on long-form question answering in ophthalmology. A RAG pipeline with ~70,000 ophthalmology-specific documents was developed. The study compared LLM responses with and without RAG on 100 consumer health questions with ten clinicians.
Results:
Without RAG, 45.3% of the 252 references were hallucinated and 34.1% were erroneous. With RAG, hallucinated references were reduced to 18.8%. However, only 62.5% of the documents by RAG were selected as the top references in the LLM response. In addition, RAG significantly improved evidence attribution from 1.85 to 2.49 (on a scale from 1 to 5), with slight decreases in accuracy and completeness.
Discussion:
LLMs exhibited prevalent hallucinated and erroneous evidence in the responses. RAG substantially reduced the proportion of such evidence but encountered challenges. The results highlight that (1) LLMs may not select documents by RAG, (2) LLMs may miss top-ranked documents by RAG, and (3) irrelevant documents by RAG downgrade the response accuracy and completeness, especially in challenging tasks.
Conclusion:
Despite their potential, LLMs in medicine require improved evidence factuality and relevance. Through a case investigation in long-form medical question answering, RAG demonstrated effectiveness but encountered challenges, highlighting the need for further development in domain-specific LLM and RAG techniques.
Speaker(s):
Qingyu Chen, PhD
Yale University
Presentation Time: 09:15 AM - 09:30 AM
Abstract Keywords: Information Retrieval, Large Language Models (LLMs), Data Mining
Primary Track: Applications
Programmatic Theme: Consumer Health Informatics
Objectives:
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. This study develops a domain-specific Retrieval Augment Generation (RAG) approach and systematically evaluates response accuracy and completeness, evidence factuality, selection, and attribution.
Materials and Methods:
We conducted a case study on long-form question answering in ophthalmology. A RAG pipeline with ~70,000 ophthalmology-specific documents was developed. The study compared LLM responses with and without RAG on 100 consumer health questions with ten clinicians.
Results:
Without RAG, 45.3% of the 252 references were hallucinated and 34.1% were erroneous. With RAG, hallucinated references were reduced to 18.8%. However, only 62.5% of the documents by RAG were selected as the top references in the LLM response. In addition, RAG significantly improved evidence attribution from 1.85 to 2.49 (on a scale from 1 to 5), with slight decreases in accuracy and completeness.
Discussion:
LLMs exhibited prevalent hallucinated and erroneous evidence in the responses. RAG substantially reduced the proportion of such evidence but encountered challenges. The results highlight that (1) LLMs may not select documents by RAG, (2) LLMs may miss top-ranked documents by RAG, and (3) irrelevant documents by RAG downgrade the response accuracy and completeness, especially in challenging tasks.
Conclusion:
Despite their potential, LLMs in medicine require improved evidence factuality and relevance. Through a case investigation in long-form medical question answering, RAG demonstrated effectiveness but encountered challenges, highlighting the need for further development in domain-specific LLM and RAG techniques.
Speaker(s):
Qingyu Chen, PhD
Yale University
Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-Form Medical Question Answering in Ophthalmology
Category
Podium Abstract