🤖 AI Summary
Spectral norm constraints in large language model (LLM) weight matrix optimization are overly restrictive, limiting generalization and adaptability.
Method: We propose Fanions—a novel family of optimizers leveraging the duality between Ky Fan $k$-norms and convex combinations of Frobenius and $ell_infty$ norms. We instantiate two variants: F-Fanions and S-Fanions, yielding concrete algorithms F-Muon and S-Muon. Our approach integrates matrix dual norm theory, convex norm composition, and Muon-style adaptive updates.
Contribution/Results: We theoretically establish the equivalence of Fanions to Dion and generalize the Muon framework to broader matrix norm structures. Empirically, F-Muon and S-Muon match Muon’s performance across multiple LLM training tasks; in synthetic linear least-squares benchmarks, they significantly outperform the original Muon—demonstrating both the efficacy and improved generalization of our norm design.
📝 Abstract
In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan $k$-norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan $k$-norms with either the Frobenius norm or the $l_infty$ norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.