🤖 AI Summary
Existing LLM routing methods rely on fine-tuned model representations, necessitating full router retraining for each newly added model—severely limiting scalability. To address this, we propose ICL-Router: a zero-shot, fine-tuning-free routing framework that leverages in-context learning (ICL) to generate universal model capability vectors. It constructs query-model joint representations via two-stage semantic alignment—(1) projecting query embeddings into a shared space and (2) modeling ICL-induced capability vectors—and predicts model performance for dynamic routing. Our key contribution is the first use of ICL vectors as plug-and-play, general-purpose model capability representations, enabling zero-shot integration of unseen models without architectural or training modifications. Experiments demonstrate that ICL-Router achieves state-of-the-art routing accuracy on both in-distribution and out-of-distribution benchmarks, significantly improving generalization and scalability over prior approaches.
📝 Abstract
Large language models (LLMs) often exhibit complementary strengths. Model routing harnesses these strengths by dynamically directing each query to the most suitable model, given a candidate model pool. However, routing performance relies on accurate model representations, and adding new models typically requires retraining, limiting scalability. To address these challenges, we propose a novel routing method using in-context vectors to represent model capabilities. The method proceeds in two stages. First, queries are embedded and projected into vectors, with a projector and LLM-based router trained to reconstruct the original queries, aligning vector representations with the router's semantic space. Second, each candidate model is profiled on a query set, and the router learns -- based on in-context vectors of query and model performance -- to predict whether each model can correctly answer new queries. Extensive experiments demonstrate that our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks. Moreover, our method allows for seamless integration of new models without retraining the router. The code is available at https://github.com/lalalamdbf/ICL-Router.