Privacy-Preserving Generative Modeling and Clinical Validation of Longitudinal Health Records for Chronic Disease

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
In chronic disease research, longitudinal health data are difficult to share due to stringent privacy constraints, while existing generative models lack both effective temporal modeling and formal privacy guarantees. To address this, we propose DP-TimeGAN—a novel generative framework integrating an enhanced TimeGAN architecture with rigorous differential privacy (DP) mechanisms. DP-TimeGAN is the first method to provide quantifiable ε-differential privacy for synthetic longitudinal clinical time-series data without compromising fidelity. Evaluated on a chronic kidney disease dataset, the generated data achieve an average realism score of 0.778 and match real-data performance in time-series transfer learning (TSTR). Clinical expert blind evaluation further confirms its superior diagnostic support capability over baseline methods. This work advances the privacy–utility trade-off frontier, establishing a new paradigm for compliant, high-quality medical AI modeling grounded in formal privacy theory.

Technology Category

Application Category

📝 Abstract
Data privacy is a critical challenge in modern medical workflows as the adoption of electronic patient records has grown rapidly. Stringent data protection regulations limit access to clinical records for training and integrating machine learning models that have shown promise in improving diagnostic accuracy and personalized care outcomes. Synthetic data offers a promising alternative; however, current generative models either struggle with time-series data or lack formal privacy guaranties. In this paper, we enhance a state-of-the-art time-series generative model to better handle longitudinal clinical data while incorporating quantifiable privacy safeguards. Using real data from chronic kidney disease and ICU patients, we evaluate our method through statistical tests, a Train-on-Synthetic-Test-on-Real (TSTR) setup, and expert clinical review. Our non-private model (Augmented TimeGAN) outperforms transformer- and flow-based models on statistical metrics in several datasets, while our private model (DP-TimeGAN) maintains a mean authenticity of 0.778 on the CKD dataset, outperforming existing state-of-the-art models on the privacy-utility frontier. Both models achieve performance comparable to real data in clinician evaluations, providing robust input data necessary for developing models for complex chronic conditions without compromising data privacy.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic longitudinal health records with privacy guarantees
Addresses data scarcity for chronic disease machine learning models
Balances privacy and utility in clinical time-series data generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced time-series generative model for clinical data
Incorporated quantifiable privacy safeguards in generative modeling
Achieved high authenticity and privacy in synthetic data generation
🔎 Similar Papers
No similar papers found.
B
Benjamin D. Ballyk
Vironix Health Inc, Austin, TX, USA
A
Ankit Gupta
Vironix Health Inc, Austin, TX, USA
S
Sujay Konda
Vironix Health Inc, Austin, TX, USA
K
Kavitha Subramanian
Stanford University, Stanford, CA, USA
C
Chris Landon
University of Southern California, Los Angeles, CA, USA
A
Ahmed Ammar Naseer
Vironix Health Inc, Austin, TX, USA
Georg Maierhofer
Georg Maierhofer
University of Cambridge
numerical analysisgeometric integrationintegral equationspartial differential equations
Sumanth Swaminathan
Sumanth Swaminathan
Vironix Health Inc, Austin, TX, USA
V
Vasudevan Venkateshwaran
Vironix Health Inc, Austin, TX, USA