Almost Clinical: Linguistic properties of synthetic electronic health records

📅 2026-01-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study presents the first systematic linguistic evaluation of synthetic electronic health records (EHRs) generated by large language models (LLMs) in the domain of mental health, assessing their linguistic fidelity and clinical appropriateness. Focusing on four clinical genres—assessment, communication, referral, and care planning—the analysis employs qualitative and quantitative methods to examine how well synthetic texts replicate authentic clinical discourse across dimensions such as agency, modality, and information flow. Findings indicate that while generated texts closely approximate real EHRs in terminology and coherence, they exhibit systematic deviations in register consistency, clinical specificity, and diagnostic accuracy. These results reveal a persistent register shift in current LLM-generated medical texts, offering critical linguistic insights for the responsible clinical deployment of synthetic EHRs.

Technology Category

Application Category

📝 Abstract

This study evaluates the linguistic and clinical suitability of synthetic electronic health records in mental health. First, we describe the rationale and the methodology for creating the synthetic corpus. Second, we examine expressions of agency, modality, and information flow across four clinical genres (Assessments, Correspondence, Referrals and Care plans) with the aim to understand how LLMs grammatically construct medical authority and patient agency through linguistic choices. While LLMs produce coherent, terminology-appropriate texts that approximate clinical practice, systematic divergences remain, including registerial shifts, insufficient clinical specificity, and inaccuracies in medication use and diagnostic procedures. The results show both the potential and limitations of synthetic corpora for enabling large-scale linguistic research otherwise impossible with genuine patient records.

Problem

Research questions and friction points this paper is trying to address.

synthetic electronic health records

linguistic properties

mental health

clinical suitability

patient agency

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic electronic health records

large language models

clinical linguistics

patient agency

medical discourse analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow