π€ AI Summary
This study addresses the challenges of accurately querying structured data and extracting information from unstructured clinical text in electronic health records (EHRs). To this end, the authors propose a unified framework that integrates large language models (LLMs) with retrieval-augmented generation (RAG): LLMs are employed to execute structured queries (e.g., Pandas operations), while RAG enhances information extraction from unstructured clinical narratives. The work introduces an innovative automatic evaluation pipeline based on synthetically generated question-answer pairs, combining exact match metrics, semantic similarity scores, and human assessments. Evaluated on a subset of MIMIC-III, the approach demonstrates improved semantic accuracy and task adaptability, offering clinical data science a flexible and reliable tool for automated reasoning and evaluation.
π Abstract
This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks: structured data querying (using programmatic languages, Python/Pandas) and information extraction from unstructured clinical text via a Retrieval Augmented Generation (RAG) pipeline. We test the ability of LLMs to interact accurately with large structured datasets for analytics and the reliability of LLMs in extracting semantically correct information from free text health records when supported by RAG. To this end, we presented a flexible evaluation framework that automatically generates synthetic question and answer pairs tailored to the characteristics of each dataset or task. Experiments were conducted on a curated subset of MIMIC III, (four structured tables and one clinical note type), using a mix of locally hosted and API-based LLMs. Evaluation combined exact-match metrics, semantic similarity, and human judgment. Our findings demonstrate the potential of LLMs to support precise querying and accurate information extraction in clinical workflows.