Generative AI-Powered Dataset Retrieval Tool
Presentation Time: 05:00 PM - 06:30 PM
Abstract Keywords: Artificial Intelligence, Information Retrieval, Machine Learning, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Retrieving datasets from clinical data warehouses is a time-consuming and requires SQL expertise. We developed the Generative AI-Powered Dataset Retrieval Tool (GAPDART), which leverages Azure OpenAI to transform natural language input into executable SQL queries, enabling users to access the necessary data for predictive modeling without manual query build. We used GAPDART to retrieve data for an inpatient hypoglycemia prediction model. We recorded the average time to generate the AI-created SQL, token usage per variable, and data retrieval time from the clinical data warehouse. GAPDART successfully generated valid SQL queries and retrieved data for the six prediction variables and one outcome variable. On average, each query took 14.95 seconds to generate, utilized approximately 2,689 tokens, and required up to 8 seconds to retrieve data from the clinical data warehouse for each variable. GAPDART demonstrates the application of generative AI to simplify querying clinical data warehouses.
Speaker(s):
Aileen Wright, MD
Vanderbilt
Author(s):
Aileen Wright, MD - Vanderbilt; Adam Wright, PhD - Vanderbilt University Medical Center; Peter Embi, MD - VUMC;
Presentation Time: 05:00 PM - 06:30 PM
Abstract Keywords: Artificial Intelligence, Information Retrieval, Machine Learning, Large Language Models (LLMs)
Primary Track: Applications
Programmatic Theme: Clinical Research Informatics
Retrieving datasets from clinical data warehouses is a time-consuming and requires SQL expertise. We developed the Generative AI-Powered Dataset Retrieval Tool (GAPDART), which leverages Azure OpenAI to transform natural language input into executable SQL queries, enabling users to access the necessary data for predictive modeling without manual query build. We used GAPDART to retrieve data for an inpatient hypoglycemia prediction model. We recorded the average time to generate the AI-created SQL, token usage per variable, and data retrieval time from the clinical data warehouse. GAPDART successfully generated valid SQL queries and retrieved data for the six prediction variables and one outcome variable. On average, each query took 14.95 seconds to generate, utilized approximately 2,689 tokens, and required up to 8 seconds to retrieve data from the clinical data warehouse for each variable. GAPDART demonstrates the application of generative AI to simplify querying clinical data warehouses.
Speaker(s):
Aileen Wright, MD
Vanderbilt
Author(s):
Aileen Wright, MD - Vanderbilt; Adam Wright, PhD - Vanderbilt University Medical Center; Peter Embi, MD - VUMC;
Generative AI-Powered Dataset Retrieval Tool
Category
Poster - Regular