Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses key challenges in large language model (LLM) understanding of structured electronic health record (EHR) tables—namely, high-dimensional sparsity, implicit contextual dependencies, and low information density. Methodologically, we introduce the first systematic LLM evaluation framework for EHR tables, integrating prompt engineering, instruction tuning, and in-context learning. Building on Llama2 and Meditron, we propose a medical-domain-optimized table serialization strategy and a context-example selection mechanism, benchmarked quantitatively using MIMIC-SQL. Experiments demonstrate that our serialization optimization improves task performance by 26.79%, while curated in-context examples enhance data extraction accuracy by 5.95%. Our contributions include: (1) the first comprehensive evaluation suite for LLM-based EHR table understanding; (2) a reusable methodology for adapting LLMs to clinical structured data; and (3) design guidelines for LLMs tailored to clinical search applications.

Technology Category

Application Category

📝 Abstract
Electronic Health Record (EHR) tables pose unique challenges among which is the presence of hidden contextual dependencies between medical features with a high level of data dimensionality and sparsity. This study presents the first investigation into the abilities of LLMs to comprehend EHRs for patient data extraction and retrieval. We conduct extensive experiments using the MIMICSQL dataset to explore the impact of the prompt structure, instruction, context, and demonstration, of two backbone LLMs, Llama2 and Meditron, based on task performance. Through quantitative and qualitative analyses, our findings show that optimal feature selection and serialization methods can enhance task performance by up to 26.79% compared to naive approaches. Similarly, in-context learning setups with relevant example selection improve data extraction performance by 5.95%. Based on our study findings, we propose guidelines that we believe would help the design of LLM-based models to support health search.
Problem

Research questions and friction points this paper is trying to address.

Language Models
Electronic Health Records
Information Extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language Model
Electronic Health Records
Performance Improvement
🔎 Similar Papers
No similar papers found.
J
Jesús Lovón-Melgarejo
Université Paul Sabatier, IRIT, Toulouse, France
M
Martin Mouysset
Université Paul Sabatier, IRIT, Toulouse, France
J
Jo Oleiwan
Université Paul Sabatier, IRIT, Toulouse, France
J
José G. Moreno
Université Paul Sabatier, IRIT, Toulouse, France
C
Christine Damase-Michel
Centre Hospitalier Universitaire de Toulouse, CERPOP INSERM UMR 1295 - SPHERE team, Faculté de Médecine Université de Toulouse, Toulouse, France
Lynda Tamine
Lynda Tamine
Professor in computer science, University of Toulouse, IRIT lab. , France
Information retrieval