On Provable Benefits of Muon in Federated Learning

πŸ“… 2025-10-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
The performance of the Muon optimizer in federated learning (FL) remains unexplored. Method: This paper introduces Muon to FL for the first time, proposing FedMuonβ€”a novel distributed optimization algorithm. FedMuon orthogonalizes gradient update directions to achieve fully adaptive learning rates without requiring problem-specific parameters (e.g., Lipschitz constants or gradient variance bounds). We establish its convergence for non-convex objectives and prove its robustness against heavy-tailed stochastic noise. Results: Experiments across diverse neural architectures (CNNs, RNNs, ViTs) and heterogeneous data settings demonstrate that FedMuon significantly improves convergence speed and training stability over baselines including FedAvg and FedAdam. Key contributions include: (i) the first federated adaptation of Muon; (ii) a parameter-free adaptive mechanism grounded in orthogonalized updates; and (iii) the first theoretical convergence guarantees for an adaptive FL optimizer under heavy-tailed noise.

Technology Category

Application Category

πŸ“ Abstract
The recently introduced optimizer, Muon, has gained increasing attention due to its superior performance across a wide range of applications. However, its effectiveness in federated learning remains unexplored. To address this gap, this paper investigates the performance of Muon in the federated learning setting. Specifically, we propose a new algorithm, FedMuon, and establish its convergence rate for nonconvex problems. Our theoretical analysis reveals multiple favorable properties of FedMuon. In particular, due to its orthonormalized update direction, the learning rate of FedMuon is independent of problem-specific parameters, and, importantly, it can naturally accommodate heavy-tailed noise. The extensive experiments on a variety of neural network architectures validate the effectiveness of the proposed algorithm.
Problem

Research questions and friction points this paper is trying to address.

Explores Muon optimizer's effectiveness in federated learning
Proposes FedMuon algorithm with convergence guarantees for nonconvex problems
Validates algorithm performance across neural architectures under heavy-tailed noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

FedMuon algorithm for federated learning
Orthonormalized update direction for parameter independence
Accommodates heavy-tailed noise naturally
πŸ”Ž Similar Papers
No similar papers found.