Benchmarking Retrieval-Augmented Generation for Medicine
Presentation Time: 09:00 AM - 09:15 AM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Retrieval-augmented generation (RAG) is a promising solution to the problems of hallucinations and outdated knowledge in large language models, but there is a lack of best practices regarding the optimal RAG setting for various medical purposes. We propose MIRAGE, a first-of-its-kind benchmark, to systematically evaluate medical RAG systems. Large-scale experiments were conducted on MIRAGE using our MedRAG toolkit. We provide practical guidelines for future implementation based on our comprehensive evaluations.
Speaker(s):
Guangzhi Xiong, BA
University of Virginia
Author(s):
Guangzhi Xiong, BA - University of Virginia; Qiao Jin, M.D. - National Institutes of Health; Zhiyong Lu, PhD - National Library of Medicine, NIH; Aidong Zhang, PhD - University of Virginia;
Presentation Time: 09:00 AM - 09:15 AM
Abstract Keywords: Large Language Models (LLMs), Evaluation, Natural Language Processing
Primary Track: Applications
Retrieval-augmented generation (RAG) is a promising solution to the problems of hallucinations and outdated knowledge in large language models, but there is a lack of best practices regarding the optimal RAG setting for various medical purposes. We propose MIRAGE, a first-of-its-kind benchmark, to systematically evaluate medical RAG systems. Large-scale experiments were conducted on MIRAGE using our MedRAG toolkit. We provide practical guidelines for future implementation based on our comprehensive evaluations.
Speaker(s):
Guangzhi Xiong, BA
University of Virginia
Author(s):
Guangzhi Xiong, BA - University of Virginia; Qiao Jin, M.D. - National Institutes of Health; Zhiyong Lu, PhD - National Library of Medicine, NIH; Aidong Zhang, PhD - University of Virginia;
Benchmarking Retrieval-Augmented Generation for Medicine
Category
Podium Abstract