A Unified Approach to Routing and Cascading for LLMs

📅 2024-10-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Existing model selection strategies in LLM agents lack theoretical optimality guarantees, clear applicability conditions, and suffer from a conceptual disconnection between routing and cascading paradigms. Method: We propose a unified cascade-routing framework that rigorously establishes the individual optimality of single-step routing and stepwise cascading, and derives a provably optimal joint scheduling policy; we identify quality estimation as a fundamental prerequisite for efficient model selection. Contribution/Results: Through theoretical analysis, algorithm design, and extensive evaluation across multiple benchmarks, our approach significantly outperforms pure routing or pure cascading in cost-performance trade-offs. We systematically characterize the operational boundaries of both paradigms and quantify their synergistic gains. This work delivers the first solution for dynamic model scheduling in LLM serving that simultaneously offers rigorous theoretical guarantees and strong empirical efficacy.

Technology Category

Application Category

📝 Abstract

The availability of a wide range of large language models (LLMs) embedded in various agentic systems has significantly increased the potential of model selection strategies to improve the cost-performance tradeoff. Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found. However, current approaches face three key limitations: they (1) lack formal proofs of optimality, (2) fail to identify the conditions under which these strategies are most effective to improve the cost-performance tradeoff, and (3) are unable to combine both paradigms for further improvements. To address these issues, we first derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy. Further, we propose cascade routing, a unified framework that integrates routing and cascading into a theoretically optimal strategy. Through our analysis, we identify good quality estimators as the critical factor for the success of model selection paradigms. Finally, in our experiments, we show that cascade routing consistently outperforms the individual approaches by a large margin and we analyze quality estimators to determine when routing and/or cascading are useful paradigms for model selection.

Problem

Research questions and friction points this paper is trying to address.

Optimal strategy for cascading

Unified routing and cascading framework

Quality estimators for model selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for model selection

Proves optimality of routing strategy

Introduces cascade routing technique

🔎 Similar Papers

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing