SpectR: Dynamically Composing LM Experts with Spectral Routing

πŸ“… 2025-04-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of dynamically composing pre-trained expert language models at zero training cost. We propose a parameter-free, frequency-domain-inspired spectral routing mechanism that analyzes the spectral characteristics of attention weights token-wise and layer-wise during inference, leveraging spectral graph theory to select or weight-merge the most suitable experts in real timeβ€”without fine-tuning or gradient computation. Unlike static routing or trainable dynamic approaches, our method achieves the first fully training-free, fine-grained (token- and layer-level) expert composition. Experiments across multi-domain expert tasks demonstrate significant improvements in routing accuracy and average performance over existing zero-training baselines, while maintaining low inference overhead.

Technology Category

Application Category

πŸ“ Abstract
Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or domains, offers a promising alternative. Leveraging the potential of these existing expert models in real-world applications requires effective methods to select or merge the models best suited for a given task. This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. Notably, our method requires no additional training and enables flexible, token- and layer-wise model combinations. Our experimental results demonstrate that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.
Problem

Research questions and friction points this paper is trying to address.

Dynamically compose expert models during inference
Select or merge models for specific tasks
Improve routing accuracy without additional training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamically composes expert models per time step
No additional training required for model composition
Enables token- and layer-wise flexible combinations
πŸ”Ž Similar Papers
No similar papers found.