Differentially Private Clustered Federated Learning

📅 2024-05-29

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

In differentially private federated learning (DPFL), existing client clustering methods suffer from sensitivity to DP noise and failure under structured data heterogeneity. To address this, we propose a robust privacy-preserving client clustering algorithm. Our method jointly models gradient updates and training loss—two complementary signals—to enhance clustering discriminability, marking the first such dual-signal integration in DPFL. Furthermore, it synergistically combines large-batch gradient estimation with Gaussian mixture modeling (GMM) to suppress both DP noise and stochastic perturbations, and provides theoretical guarantees on clustering consistency. Experiments under stringent privacy budgets (ε ≤ 2) and diverse non-IID settings demonstrate that our approach improves model accuracy by 12.7% on average and boosts clustering accuracy by 31.5% over baselines, while significantly enhancing convergence stability.

Technology Category

Application Category

📝 Abstract

Federated learning (FL), which is a decentralized machine learning (ML) approach, often incorporates differential privacy (DP) to provide rigorous data privacy guarantees. Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL), but these methods remain sensitive and prone to errors, further exacerbated by the DP noise. This vulnerability makes the previous methods inappropriate for differentially private FL (DPFL) settings with structured data heterogeneity. To address this gap, we propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly. To this end, we propose to cluster clients based on both their model updates and training loss values. Furthermore, for clustering clients' model updates at the end of the first round, our proposed approach addresses the server's uncertainties by employing large batch sizes as well as Gaussian Mixture Models (GMM) to reduce the impact of DP and stochastic noise and avoid potential clustering errors. This idea is efficient especially in privacy-sensitive scenarios with more DP noise. We provide theoretical analysis to justify our approach and evaluate it across diverse data distributions and privacy budgets. Our experimental results show its effectiveness in addressing large structured data heterogeneity in DPFL.

Problem

Research questions and friction points this paper is trying to address.

Addresses data privacy in federated learning

Improves clustering accuracy with differential privacy

Reduces noise impact in clustered FL settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentially Private Clustered FL

Clustering based on model updates

Gaussian Mixture Models reduce noise

🔎 Similar Papers

Differentially Private Federated Learning: A Systematic Review

2024-05-14arXiv.orgCitations: 24

Authors to Follow