Almost Clinical: Linguistic properties of synthetic electronic health records

📅 2026-01-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study presents the first systematic linguistic evaluation of synthetic electronic health records (EHRs) generated by large language models (LLMs) in the domain of mental health, assessing their linguistic fidelity and clinical appropriateness. Focusing on four clinical genres—assessment, communication, referral, and care planning—the analysis employs qualitative and quantitative methods to examine how well synthetic texts replicate authentic clinical discourse across dimensions such as agency, modality, and information flow. Findings indicate that while generated texts closely approximate real EHRs in terminology and coherence, they exhibit systematic deviations in register consistency, clinical specificity, and diagnostic accuracy. These results reveal a persistent register shift in current LLM-generated medical texts, offering critical linguistic insights for the responsible clinical deployment of synthetic EHRs.

Technology Category

Application Category

📝 Abstract
This study evaluates the linguistic and clinical suitability of synthetic electronic health records in mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we examine expressions of agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) with the aim to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures. The results show both the potential and limitations of synthetic corpora for enabling large-scale linguistic research otherwise impossible with genuine patient records.
Problem

Research questions and friction points this paper is trying to address.

synthetic electronic health records
linguistic properties
mental health
clinical suitability
patient agency
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic electronic health records
large language models
clinical linguistics
patient agency
medical discourse analysis
🔎 Similar Papers
No similar papers found.