Synthetic medical data generation: state of the art and application to trauma mechanism classification

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the tension between patient privacy protection and research reproducibility in medical data sharing, this paper proposes a multimodal synthetic data generation framework for automated trauma mechanism classification. The framework synergistically integrates generative adversarial networks (GANs), variational autoencoders (VAEs), and large language models (LLMs) to jointly model structured clinical variables and unstructured free-text narratives while ensuring cross-modal semantic consistency. Synthetic data are rigorously evaluated via discriminative metrics and statistical fidelity assessments. Results demonstrate that the generated data preserve the original distributional characteristics and significantly improve downstream classification performance—achieving an average accuracy gain of 6.2%. This work presents the first controllable, joint synthesis of clinical tabular and textual data, establishing a high-quality, reproducible data infrastructure for privacy-sensitive medical AI development.

Technology Category

Application Category

📝 Abstract
Faced with the challenges of patient confidentiality and scientific reproducibility, research on machine learning for health is turning towards the conception of synthetic medical databases. This article presents a brief overview of state-of-the-art machine learning methods for generating synthetic tabular and textual data, focusing their application to the automatic classification of trauma mechanisms, followed by our proposed methodology for generating high-quality, synthetic medical records combining tabular and unstructured text data.
Problem

Research questions and friction points this paper is trying to address.

Generating synthetic medical data for privacy and reproducibility
Applying machine learning to classify trauma mechanisms
Combining tabular and text data in synthetic records
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating synthetic medical data for privacy
Combining tabular and text data generation
Applying machine learning to trauma classification
🔎 Similar Papers
No similar papers found.
O
Océane Doremus
AHeaD Team, Université de Bordeaux, INSERM, BPH, U1219, F-33000 Bordeaux, France.
A
Ariel Guerra-Adames
AHeaD Team, Université de Bordeaux, INSERM, BPH, U1219, F-33000 Bordeaux, France.
M
Marta Avalos-Fernandez
SISTM Team, Université de Bordeaux, INSERM, INRIA, BPH, U1219, F-33000 Bordeaux, France.
V
Vianney Jouhet
CHU de Bordeaux, INSERM, U1219, F-33000 Bordeaux, France.
C
Cédric Gil-Jardiné
CHU de Bordeaux, INSERM, U1219, F-33000 Bordeaux, France.
Emmanuel Lagarde
Emmanuel Lagarde
INSERM