π€ AI Summary
This work addresses the challenge of dynamically composing pre-trained expert language models at zero training cost. We propose a parameter-free, frequency-domain-inspired spectral routing mechanism that analyzes the spectral characteristics of attention weights token-wise and layer-wise during inference, leveraging spectral graph theory to select or weight-merge the most suitable experts in real timeβwithout fine-tuning or gradient computation. Unlike static routing or trainable dynamic approaches, our method achieves the first fully training-free, fine-grained (token- and layer-level) expert composition. Experiments across multi-domain expert tasks demonstrate significant improvements in routing accuracy and average performance over existing zero-training baselines, while maintaining low inference overhead.
π Abstract
Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or domains, offers a promising alternative. Leveraging the potential of these existing expert models in real-world applications requires effective methods to select or merge the models best suited for a given task. This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. Notably, our method requires no additional training and enables flexible, token- and layer-wise model combinations. Our experimental results demonstrate that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.