π€ AI Summary
This paper addresses the challenge of coexisting cluster-identity concealment and Byzantine attacks in clustering federated learning (CFL) under data heterogeneity. Methodologically, it proposes the first privacy-preserving robust clustered collaborative training framework: (1) secure aggregation via secure multi-party computation to prevent cluster-identity leakage; (2) a gradient-correlation-based detection and weighted aggregation mechanism to ensure both Byzantine resilience and convergence; and (3) a lightweight client-side encoding verification scheme to jointly guarantee authentication and confidentiality. Theoretical analysis formally proves the frameworkβs security. Experiments demonstrate that, when the number of clusters (m = 1), the proposed method achieves at least an (O(log n)) speedup in computational efficiency over baselines; its communication and computational complexities are (O(ml + m^2)) and (O(m^2 l)), respectively.
π Abstract
Despite federated learning (FL)'s potential in collaborative learning, its performance has deteriorated due to the data heterogeneity of distributed users. Recently, clustered federated learning (CFL) has emerged to address this challenge by partitioning users into clusters according to their similarity. However, CFL faces difficulties in training when users are unwilling to share their cluster identities due to privacy concerns. To address these issues, we present an innovative Efficient and Robust Secure Aggregation scheme for CFL, dubbed EBS-CFL. The proposed EBS-CFL supports effectively training CFL while maintaining users' cluster identity confidentially. Moreover, it detects potential poisonous attacks without compromising individual client gradients by discarding negatively correlated gradients and aggregating positively correlated ones using a weighted approach. The server also authenticates correct gradient encoding by clients. EBS-CFL has high efficiency with client-side overhead O(ml + m^2) for communication and O(m^2l) for computation, where m is the number of cluster identities, and l is the gradient size. When m = 1, EBS-CFL's computational efficiency of client is at least O(log(n)) times better than comparison schemes, where n is the number of clients. In addition, we validate the scheme through extensive experiments. Finally, we theoretically prove the scheme's security.