Eigenvectors of Experts are Training-free Non-collapsing Routers

๐Ÿ“… 2026-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

207K/year
๐Ÿค– AI Summary
This work addresses the performance degradation in sparse mixture-of-experts (SMoE) models caused by expert collapse, a challenge inadequately resolved by existing routing methods. The authors propose SSMoE, a novel framework that revealsโ€” for the first timeโ€”that the principal eigenvectors of expert weight matrices encode rich semantic information. Leveraging this insight, they introduce a training-free spectral routing mechanism that constructs routing decisions directly from the spectral properties of weight matrices via singular value decomposition (SVD). This approach departs from conventional paradigms that rely on fine-tuning or retraining, demonstrating strong generalization and robustness across diverse language and vision tasks. Notably, SSMoE significantly outperforms current methods even under data contamination scenarios.
๐Ÿ“ Abstract
Sparse Mixture of Experts (SMoE) architectures improve the training efficiency of Large Language Models (LLMs) by routing input tokens to a selected subset of specialized experts. Despite their remarkable success, both training and inference in SMoE models suffer from the expert collapse issue (Chi et al., 2022), which degrades model performance. Prior studies primarily focus on improving the router; however, such methods rely on training from scratch or fine-tuning, which requires high computational and data-processing costs. Furthermore, we demonstrate that, despite these efforts, the issue persists when advancing well-pretrained SMoE models, as evidenced by both theoretical and empirical results. To fill that gap, we analyze the advanced SMoE models and observe that the eigenvectors of expert weight matrices encode rich semantic information, pointing to an effective alternative to conventional routing strategies. Building on this insight, we propose Singular Value Decomposition SMoE (SSMoE), a novel and training-free framework that leverages spectral properties of the expert weights to address the collapse issue and enhance model performance. Extensive experiments across diverse language and vision tasks, under both clean and corrupt data settings, demonstrate the strong generalization and robustness of SSMoE. Our findings highlight how a deeper understanding of model internals can guide the development of more effective SMoE architectures. Our implementation is publicly available at https://github.com/giangdip2410/SSMoE.
Problem

Research questions and friction points this paper is trying to address.

Sparse Mixture of Experts
expert collapse
training-free
routing
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free routing
eigenvectors of experts
Sparse Mixture of Experts
expert collapse
spectral analysis
๐Ÿ”Ž Similar Papers
No similar papers found.