MeDiSumQA: Patient-Oriented Question-Answer Generation from Discharge Letters

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to generate safe, effective, and patient-friendly medical texts due to substantial patient health literacy disparities, the complexity of clinical terminology, and the absence of standardized evaluation benchmarks. Method: We introduce MeDiSumQA—the first patient-centered discharge summary question-answering generation benchmark—built via a novel two-stage “LLM auto-generation + human expert validation” paradigm. Leveraging MIMIC-IV, we developed an automated pipeline integrating GPT-series models to generate QA pairs, followed by rigorous clinical expert curation. Contribution/Results: Experiments show general-purpose LLMs significantly outperform biomedical-specialized models on patient-facing QA tasks; automated evaluation metrics achieve high correlation with human judgments (Spearman’s ρ > 0.9). MeDiSumQA is publicly released via PhysioNet, establishing the first reproducible, scalable, and trustworthy benchmark for assessing safety and comprehensibility in patient-centered medical AI.

Technology Category

Application Category

📝 Abstract
While increasing patients' access to medical documents improves medical care, this benefit is limited by varying health literacy levels and complex medical terminology. Large language models (LLMs) offer solutions by simplifying medical information. However, evaluating LLMs for safe and patient-friendly text generation is difficult due to the lack of standardized evaluation resources. To fill this gap, we developed MeDiSumQA. MeDiSumQA is a dataset created from MIMIC-IV discharge summaries through an automated pipeline combining LLM-based question-answer generation with manual quality checks. We use this dataset to evaluate various LLMs on patient-oriented question-answering. Our findings reveal that general-purpose LLMs frequently surpass biomedical-adapted models, while automated metrics correlate with human judgment. By releasing MeDiSumQA on PhysioNet, we aim to advance the development of LLMs to enhance patient understanding and ultimately improve care outcomes.
Problem

Research questions and friction points this paper is trying to address.

Simplify medical discharge letters for patients
Evaluate large language models for patient-friendly text
Create a dataset to improve medical understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based question-answer generation
Automated pipeline with manual checks
Dataset release for model evaluation
🔎 Similar Papers
No similar papers found.
Amin Dada
Amin Dada
Institute for AI in Medicine (IKIM), University Hospital Essen
O
Osman Alperen Koras
Institute for AI in Medicine (IKIM), University Hospital Essen, Germany
Marie Bauer
Marie Bauer
Software Developer, Insitute for AI in Medicine, Essen
NLPMLComputational LinguisticsLinguistics
A
Amanda Butler
NVIDIA, Santa Clara, USA
Kaleb E. Smith
Kaleb E. Smith
Nvidia
Machine learningGenerative ModelsDeep LearningComputer VisionTime Series Analysis
J
J. Kleesiek
Institute for AI in Medicine (IKIM), University Hospital Essen, Germany; Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Germany; German Cancer Consortium (DKTK, Partner site Essen), Germany; Department of Physics, TU Dortmund, Germany
J
Julian Friedrich
Institute for AI in Medicine (IKIM), University Hospital Essen, Germany