The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Spectral norm constraints in large language model (LLM) weight matrix optimization are overly restrictive, limiting generalization and adaptability. Method: We propose Fanions—a novel family of optimizers leveraging the duality between Ky Fan $k$-norms and convex combinations of Frobenius and $ell_infty$ norms. We instantiate two variants: F-Fanions and S-Fanions, yielding concrete algorithms F-Muon and S-Muon. Our approach integrates matrix dual norm theory, convex norm composition, and Muon-style adaptive updates. Contribution/Results: We theoretically establish the equivalence of Fanions to Dion and generalize the Muon framework to broader matrix norm structures. Empirically, F-Muon and S-Muon match Muon’s performance across multiple LLM training tasks; in synthetic linear least-squares benchmarks, they significantly outperform the original Muon—demonstrating both the efficacy and improved generalization of our norm design.

Technology Category

Application Category

📝 Abstract

In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan $k$-norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan $k$-norms with either the Frobenius norm or the $l_infty$ norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.

Problem

Research questions and friction points this paper is trying to address.

Optimizing weight matrices in large language models

Introducing Fanions using duals of Ky Fan k-norms

Improving performance on synthetic linear least squares

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using duals of Ky Fan k-norms for optimization

Introducing Fanions family with F-Fanions and S-Fanions

Combining Ky Fan norms with Frobenius or l∞ norms

🔎 Similar Papers

No similar papers found.

Authors to Follow