CARROT: A Cost Aware Rate Optimal Router

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance–cost trade-off in large language model (LLM) query routing by proposing the first lightweight, cost-aware router with provable minimax rate optimality. Methodologically, it introduces a dynamic routing policy grounded in joint performance–cost estimation and constructs SPROUT—a comprehensive benchmark comprising diverse real-world queries and state-of-the-art LLMs. Theoretical analysis is established within the statistical learning framework of rate optimality. Empirical evaluation integrates SPROUT, RouterBench, and Open LLM Leaderboard v2. Results demonstrate that the router reduces average invocation cost by 23%–41% while preserving response quality, with inference latency under 5 ms—significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
With the rapid growth in the number of Large Language Models (LLMs), there has been a recent interest in LLM routing, or directing queries to the cheapest LLM that can deliver a suitable response. Following this line of work, we introduce CARROT, a Cost AwaRe Rate Optimal rouTer that can select models based on any desired trade-off between performance and cost. Given a query, CARROT selects a model based on estimates of models' cost and performance. Its simplicity lends CARROT computational efficiency, while our theoretical analysis demonstrates minimax rate-optimality in its routing performance. Alongside CARROT, we also introduce the Smart Price-aware Routing (SPROUT) dataset to facilitate routing on a wide spectrum of queries with the latest state-of-the-art LLMs. Using SPROUT and prior benchmarks such as Routerbench and open-LLM-leaderboard-v2 we empirically validate CARROT's performance against several alternative routers.
Problem

Research questions and friction points this paper is trying to address.

Optimizes LLM query routing
Balances cost and performance
Introduces SPROUT dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cost-aware LLM routing
Performance-cost trade-off
Minimax rate-optimality
🔎 Similar Papers
No similar papers found.
Seamus Somerstep
Seamus Somerstep
University of Michigan
Algorithmic FairnessLLM Alignment
Felipe Maia Polo
Felipe Maia Polo
University of Michigan
AI evaluationstatisticsmachine learning
A
Allysson Flavio Melo de Oliveira
IBM Research, MIT-IBM Watson AI Lab
P
Prattyush Mangal
IBM Research, MIT-IBM Watson AI Lab
M
M'irian Silva
IBM Research, MIT-IBM Watson AI Lab, Federal University of Minas Gerais
O
Onkar Bhardwaj
IBM Research, MIT-IBM Watson AI Lab
M
M. Yurochkin
IBM Research, MIT-IBM Watson AI Lab
Subha Maity
Subha Maity
University of Waterloo
Transfer learningDistribution shiftAlgorithmic fairness