Clustered Federated Learning via Embedding Distributions

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation in federated learning caused by non-independent and identically distributed (non-IID) data, this paper proposes a client clustering method based on embedding-space distribution similarity. The method quantifies local embedding distribution discrepancies across clients using Earth Mover’s Distance (EMD) in a single communication round—departing from conventional iterative clustering paradigms—and integrates domain adaptation theory to enhance robustness in distribution modeling. Evaluated on multiple challenging non-IID benchmarks, the approach consistently outperforms 16 state-of-the-art baselines. It achieves simultaneous improvements in clustering quality and global model accuracy, demonstrating superior effectiveness, generalizability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) is a widely used framework for machine learning in distributed data environments where clients hold data that cannot be easily centralised, such as for data protection reasons. FL, however, is known to be vulnerable to non-IID data. Clustered FL addresses this issue by finding more homogeneous clusters of clients. We propose a novel one-shot clustering method, EMD-CFL, using the Earth Mover's distance (EMD) between data distributions in embedding space. We theoretically motivate the use of EMDs using results from the domain adaptation literature and demonstrate empirically superior clustering performance in extensive comparisons against 16 baselines and on a range of challenging datasets.
Problem

Research questions and friction points this paper is trying to address.

Addresses non-IID data vulnerability in Federated Learning
Proposes one-shot clustering using EMD for homogeneous client groups
Improves clustering performance via embedding space distribution analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot clustering using EMD
EMD measures embedding distribution distances
Superior performance versus 16 baselines
🔎 Similar Papers
No similar papers found.