Differentially Private Clustered Federated Learning

📅 2024-05-29
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
In differentially private federated learning (DPFL), existing client clustering methods suffer from sensitivity to DP noise and failure under structured data heterogeneity. To address this, we propose a robust privacy-preserving client clustering algorithm. Our method jointly models gradient updates and training loss—two complementary signals—to enhance clustering discriminability, marking the first such dual-signal integration in DPFL. Furthermore, it synergistically combines large-batch gradient estimation with Gaussian mixture modeling (GMM) to suppress both DP noise and stochastic perturbations, and provides theoretical guarantees on clustering consistency. Experiments under stringent privacy budgets (ε ≤ 2) and diverse non-IID settings demonstrate that our approach improves model accuracy by 12.7% on average and boosts clustering accuracy by 31.5% over baselines, while significantly enhancing convergence stability.

Technology Category

Application Category

📝 Abstract
Federated learning (FL), which is a decentralized machine learning (ML) approach, often incorporates differential privacy (DP) to provide rigorous data privacy guarantees. Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL), but these methods remain sensitive and prone to errors, further exacerbated by the DP noise. This vulnerability makes the previous methods inappropriate for differentially private FL (DPFL) settings with structured data heterogeneity. To address this gap, we propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly. To this end, we propose to cluster clients based on both their model updates and training loss values. Furthermore, for clustering clients' model updates at the end of the first round, our proposed approach addresses the server's uncertainties by employing large batch sizes as well as Gaussian Mixture Models (GMM) to reduce the impact of DP and stochastic noise and avoid potential clustering errors. This idea is efficient especially in privacy-sensitive scenarios with more DP noise. We provide theoretical analysis to justify our approach and evaluate it across diverse data distributions and privacy budgets. Our experimental results show its effectiveness in addressing large structured data heterogeneity in DPFL.
Problem

Research questions and friction points this paper is trying to address.

Addresses data privacy in federated learning
Improves clustering accuracy with differential privacy
Reduces noise impact in clustered FL settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentially Private Clustered FL
Clustering based on model updates
Gaussian Mixture Models reduce noise
🔎 Similar Papers
S
Saber Malekmohammadi
Mila - Quebec AI institute, School of Computer Science, University of Waterloo, Waterloo, Canada
Afaf Taïk
Afaf Taïk
Université de Sherbrooke
Machine learningFederated LearningAlgorithmic Fairness
G
G. Farnadi
Mila - Quebec AI institute, McGill University, Université de Montréal, Montreal, Canada