OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Federated clustering (FC) faces two key challenges: privacy leakage risks during client collaboration and degraded global clustering accuracy and robustness due to non-IID local data. This paper proposes the first FC framework based on Lagrange coded computing, enabling lossless and secure reconstruction of the global pairwise distance matrix without sharing raw data. The method inherently resists colluding-client attacks, decouples local proxy computation from the clustering algorithm, and supports plug-and-play integration of arbitrary centralized clustering algorithms. We provide theoretical guarantees for both exact reconstruction and information-theoretic privacy security. Extensive experiments on multiple non-IID benchmarks demonstrate significant improvements over state-of-the-art methods, achieving superior robustness, high clustering effectiveness, and cross-algorithm generality.

Technology Category

Application Category

📝 Abstract
Federated clustering (FC) aims to discover global cluster structures across decentralized clients without sharing raw data, making privacy preservation a fundamental requirement. There are two critical challenges: (1) privacy leakage during collaboration, and (2) robustness degradation due to aggregation of proxy information from non-independent and identically distributed (Non-IID) local data, leading to inaccurate or inconsistent global clustering. Existing solutions typically rely on model-specific local proxies, which are sensitive to data heterogeneity and inherit inductive biases from their centralized counterparts, thus limiting robustness and generality. We propose Omni Federated Clustering (OmniFC), a unified and model-agnostic framework. Leveraging Lagrange coded computing, our method enables clients to share only encoded data, allowing exact reconstruction of the global distance matrix--a fundamental representation of sample relationships--without leaking private information, even under client collusion. This construction is naturally resilient to Non-IID data distributions. This approach decouples FC from model-specific proxies, providing a unified extension mechanism applicable to diverse centralized clustering methods. Theoretical analysis confirms both reconstruction fidelity and privacy guarantees, while comprehensive experiments demonstrate OmniFC's superior robustness, effectiveness, and generality across various benchmarks compared to state-of-the-art methods. Code will be released.
Problem

Research questions and friction points this paper is trying to address.

Privacy leakage in federated clustering collaboration
Robustness degradation from Non-IID data aggregation
Model-specific proxy limitations in existing solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Lagrange coded computing for secure data sharing
Reconstructs global distance matrix without privacy leaks
Model-agnostic framework resilient to Non-IID data
🔎 Similar Papers
No similar papers found.
Jie Yan
Jie Yan
jieyan@amss.ac.cn
deep generative modelsclustering
X
Xin Liu
Central University of Finance and Economics
Z
Zhong-Yuan Zhang
Central University of Finance and Economics