Differentially Private Federated Learning of Diffusion Models for Synthetic Tabular Data Generation

📅 2024-12-20

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

To address the urgent need for privacy-preserving synthetic data generation in highly regulated domains such as finance, this paper proposes the first end-to-end framework integrating differential privacy (DP), federated learning (FL), and denoising diffusion probabilistic models (DDPMs). The method introduces a novel dynamic privacy budget allocation mechanism and a client-aware optimization strategy, jointly accommodating heterogeneous device constraints and synergistically combining gradient clipping with noise injection for enhanced privacy guarantees. Evaluated on multiple real-world financial datasets under strict DP constraints (ε ≤ 8), our approach achieves downstream task utility exceeding 95% of that obtained using original data—significantly outperforming state-of-the-art baselines DP-GAN and DP-FedTab. This work establishes an effective balance between high-fidelity data synthesis and strong formal privacy protection, advancing practical deployment of privacy-aware generative modeling in regulated settings.

Technology Category

Application Category

📝 Abstract

The increasing demand for privacy-preserving data analytics in finance necessitates solutions for synthetic data generation that rigorously uphold privacy standards. We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models designed to generate high-fidelity synthetic tabular data. This framework ensures compliance with stringent privacy regulations while maintaining data utility. We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets, achieving significant improvements in privacy guarantees without compromising data quality. Our empirical evaluations reveal the optimal trade-offs between privacy budgets, client configurations, and federated optimization strategies. The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains, paving the way for further advances in federated learning and privacy-preserving data synthesis.

Problem

Research questions and friction points this paper is trying to address.

Generating synthetic tabular data with privacy guarantees

Integrating differential privacy with federated diffusion modeling

Balancing privacy budgets and data utility in synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated learning with differential privacy

Denoising diffusion models for tabular data

Privacy-preserving synthetic data generation

🔎 Similar Papers

No similar papers found.

Authors to Follow