🤖 AI Summary
This paper studies nonconvex–PL-type minimax optimization in federated learning under heavy-tailed gradient noise—i.e., gradients with unbounded variance and non-Gaussian distributions. To overcome the limitation of conventional algorithms that rely on bounded-variance assumptions, we propose two novel robust federated minimax algorithms with provable convergence guarantees under heavy tails: FedRNM, which employs normalized stochastic gradients, and FedRMMU, which integrates the Muon optimizer. Both methods incorporate robust local updates and global aggregation mechanisms to mitigate the impact of heavy-tailed noise. We establish a convergence rate of $Oig((TNp)^{-frac{s-1}{2s}}ig)$, where $s>1$ is the tail index characterizing the gradient distribution—significantly generalizing existing convergence analyses for federated minimax optimization. Extensive experiments demonstrate that our algorithms achieve superior robustness and stability over baseline methods under heavy-tailed noise settings.
📝 Abstract
Heavy-tailed noise has attracted growing attention in nonconvex stochastic optimization, as numerous empirical studies suggest it offers a more realistic assumption than standard bounded variance assumption. In this work, we investigate nonconvex-PL minimax optimization under heavy-tailed gradient noise in federated learning. We propose two novel algorithms: Fed-NSGDA-M, which integrates normalized gradients, and FedMuon-DA, which leverages the Muon optimizer for local updates. Both algorithms are designed to effectively address heavy-tailed noise in federated minimax optimization, under a milder condition. We theoretically establish that both algorithms achieve a convergence rate of $O({1}/{(TNp)^{frac{s-1}{2s}}})$. To the best of our knowledge, these are the first federated minimax optimization algorithms with rigorous theoretical guarantees under heavy-tailed noise. Extensive experiments further validate their effectiveness.