🤖 AI Summary
This paper addresses the practical challenge in federated learning where model parameters reside on a Riemannian manifold, are subject to constraints, and only noisy function values—not gradients—are accessible.
Method: We propose the first zeroth-order federated optimization algorithm on Riemannian manifolds. It employs a Euclidean random perturbation-based zeroth-order Riemannian gradient estimator, circumventing costly tangent-space sampling; integrates Riemannian projection operators with a distributed architecture to enable privacy-preserving constrained optimization.
Contributions/Results: We prove the estimator is unbiased and admits a tight variance bound, and establish a sublinear convergence rate for the algorithm. Experiments demonstrate superior performance over state-of-the-art zeroth- and first-order federated methods on zeroth-order deep network attacks and low-rank model training, achieving a favorable trade-off among computational efficiency, robustness to noise, and practical applicability.
📝 Abstract
Federated learning (FL) has emerged as a powerful paradigm for collaborative model training across distributed clients while preserving data privacy. However, existing FL algorithms predominantly focus on unconstrained optimization problems with exact gradient information, limiting its applicability in scenarios where only noisy function evaluations are accessible or where model parameters are constrained. To address these challenges, we propose a novel zeroth-order projection-based algorithm on Riemannian manifolds for FL. By leveraging the projection operator, we introduce a computationally efficient zeroth-order Riemannian gradient estimator. Unlike existing estimators, ours requires only a simple Euclidean random perturbation, eliminating the need to sample random vectors in the tangent space, thus reducing computational cost. Theoretically, we first prove the approximation properties of the estimator and then establish the sublinear convergence of the proposed algorithm, matching the rate of its first-order counterpart. Numerically, we first assess the efficiency of our estimator using kernel principal component analysis. Furthermore, we apply the proposed algorithm to two real-world scenarios: zeroth-order attacks on deep neural networks and low-rank neural network training to validate the theoretical findings.