🤖 AI Summary
To address the urgent need for privacy-preserving synthetic data generation in highly regulated domains such as finance, this paper proposes the first end-to-end framework integrating differential privacy (DP), federated learning (FL), and denoising diffusion probabilistic models (DDPMs). The method introduces a novel dynamic privacy budget allocation mechanism and a client-aware optimization strategy, jointly accommodating heterogeneous device constraints and synergistically combining gradient clipping with noise injection for enhanced privacy guarantees. Evaluated on multiple real-world financial datasets under strict DP constraints (ε ≤ 8), our approach achieves downstream task utility exceeding 95% of that obtained using original data—significantly outperforming state-of-the-art baselines DP-GAN and DP-FedTab. This work establishes an effective balance between high-fidelity data synthesis and strong formal privacy protection, advancing practical deployment of privacy-aware generative modeling in regulated settings.
📝 Abstract
The increasing demand for privacy-preserving data analytics in finance necessitates solutions for synthetic data generation that rigorously uphold privacy standards. We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models designed to generate high-fidelity synthetic tabular data. This framework ensures compliance with stringent privacy regulations while maintaining data utility. We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets, achieving significant improvements in privacy guarantees without compromising data quality. Our empirical evaluations reveal the optimal trade-offs between privacy budgets, client configurations, and federated optimization strategies. The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains, paving the way for further advances in federated learning and privacy-preserving data synthesis.