Eigenvectors of Experts are Training-free Non-collapsing Routers

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the performance degradation in sparse mixture-of-experts (SMoE) models caused by expert collapse, a challenge inadequately resolved by existing routing methods. The authors propose SSMoE, a novel framework that reveals— for the first time—that the principal eigenvectors of expert weight matrices encode rich semantic information. Leveraging this insight, they introduce a training-free spectral routing mechanism that constructs routing decisions directly from the spectral properties of weight matrices via singular value decomposition (SVD). This approach departs from conventional paradigms that rely on fine-tuning or retraining, demonstrating strong generalization and robustness across diverse language and vision tasks. Notably, SSMoE significantly outperforms current methods even under data contamination scenarios.

📝 Abstract

Sparse Mixture of Experts (SMoE) architectures improve the training efficiency of Large Language Models (LLMs) by routing input tokens to a selected subset of specialized experts. Despite their remarkable success, both training and inference in SMoE models suffer from the expert collapse issue (Chi et al., 2022), which degrades model performance. Prior studies primarily focus on improving the router; however, such methods rely on training from scratch or fine-tuning, which requires high computational and data-processing costs. Furthermore, we demonstrate that, despite these efforts, the issue persists when advancing well-pretrained SMoE models, as evidenced by both theoretical and empirical results. To fill that gap, we analyze the advanced SMoE models and observe that the eigenvectors of expert weight matrices encode rich semantic information, pointing to an effective alternative to conventional routing strategies. Building on this insight, we propose Singular Value Decomposition SMoE (SSMoE), a novel and training-free framework that leverages spectral properties of the expert weights to address the collapse issue and enhance model performance. Extensive experiments across diverse language and vision tasks, under both clean and corrupt data settings, demonstrate the strong generalization and robustness of SSMoE. Our findings highlight how a deeper understanding of model internals can guide the development of more effective SMoE architectures. Our implementation is publicly available at https://github.com/giangdip2410/SSMoE.

Problem

Research questions and friction points this paper is trying to address.

Sparse Mixture of Experts

expert collapse

training-free

routing

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free routing

eigenvectors of experts

Sparse Mixture of Experts

expert collapse

spectral analysis

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow