🤖 AI Summary
Federated clustering (FC) faces two key challenges: privacy leakage risks during client collaboration and degraded global clustering accuracy and robustness due to non-IID local data. This paper proposes the first FC framework based on Lagrange coded computing, enabling lossless and secure reconstruction of the global pairwise distance matrix without sharing raw data. The method inherently resists colluding-client attacks, decouples local proxy computation from the clustering algorithm, and supports plug-and-play integration of arbitrary centralized clustering algorithms. We provide theoretical guarantees for both exact reconstruction and information-theoretic privacy security. Extensive experiments on multiple non-IID benchmarks demonstrate significant improvements over state-of-the-art methods, achieving superior robustness, high clustering effectiveness, and cross-algorithm generality.
📝 Abstract
Federated clustering (FC) aims to discover global cluster structures across decentralized clients without sharing raw data, making privacy preservation a fundamental requirement. There are two critical challenges: (1) privacy leakage during collaboration, and (2) robustness degradation due to aggregation of proxy information from non-independent and identically distributed (Non-IID) local data, leading to inaccurate or inconsistent global clustering. Existing solutions typically rely on model-specific local proxies, which are sensitive to data heterogeneity and inherit inductive biases from their centralized counterparts, thus limiting robustness and generality. We propose Omni Federated Clustering (OmniFC), a unified and model-agnostic framework. Leveraging Lagrange coded computing, our method enables clients to share only encoded data, allowing exact reconstruction of the global distance matrix--a fundamental representation of sample relationships--without leaking private information, even under client collusion. This construction is naturally resilient to Non-IID data distributions. This approach decouples FC from model-specific proxies, providing a unified extension mechanism applicable to diverse centralized clustering methods. Theoretical analysis confirms both reconstruction fidelity and privacy guarantees, while comprehensive experiments demonstrate OmniFC's superior robustness, effectiveness, and generality across various benchmarks compared to state-of-the-art methods. Code will be released.