SpectR: Dynamically Composing LM Experts with Spectral Routing

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of dynamically composing pre-trained expert language models at zero training cost. We propose a parameter-free, frequency-domain-inspired spectral routing mechanism that analyzes the spectral characteristics of attention weights token-wise and layer-wise during inference, leveraging spectral graph theory to select or weight-merge the most suitable experts in real time—without fine-tuning or gradient computation. Unlike static routing or trainable dynamic approaches, our method achieves the first fully training-free, fine-grained (token- and layer-level) expert composition. Experiments across multi-domain expert tasks demonstrate significant improvements in routing accuracy and average performance over existing zero-training baselines, while maintaining low inference overhead.

Technology Category

Application Category

📝 Abstract

Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or domains, offers a promising alternative. Leveraging the potential of these existing expert models in real-world applications requires effective methods to select or merge the models best suited for a given task. This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. Notably, our method requires no additional training and enables flexible, token- and layer-wise model combinations. Our experimental results demonstrate that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.

Problem

Research questions and friction points this paper is trying to address.

Dynamically compose expert models during inference

Select or merge models for specific tasks

Improve routing accuracy without additional training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamically composes expert models per time step

No additional training required for model composition

Enables token- and layer-wise flexible combinations

🔎 Similar Papers

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models