Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of balancing users’ personalized cost–performance trade-offs in large language model (LLM) routing, a limitation of existing methods. The authors propose a preference-aware LLM routing paradigm that formulates implicit user preferences as a meta-learning task for the first time. By integrating contextual bandits with multi-task learning, they develop MetaRouter—a framework designed for efficient, personalized model selection. MetaRouter rapidly adapts to individual user preferences with minimal interaction and demonstrates significant performance gains over strong baselines on both in-distribution and out-of-distribution tasks. The approach exhibits high sample efficiency, robustness to changes in the set of routable models, and strong scalability across diverse model ensembles.

📝 Abstract

Large language models (LLMs) present a trade-off between performance and cost, where more powerful models incur greater expense. LLM routing aims to mitigate expenses while maintaining performance by sending queries to the most suitable model. However, existing methods cannot perform well for different user cost-performance preferences. To address this gap, we introduce a novel perceptive LLM routing paradigm for personalized and user-centric cost-performance optimization, which efficiently learns users' implicit preferences through little interaction. To handle the challenge of heterogeneous user needs, we formulate preference profiles as a set of distinct tasks in contextual bandit and propose MetaRouter, a meta-learning framework designed for preference-aware LLM routing. Experimental results show that MetaRouter outperforms strong baselines on both in-distribution and out-of-distribution tasks. Furthermore, it exhibits high efficiency in learning user preferences, robustness to changes in the routable LLMs, and scalability to multi-model routing.

Problem

Research questions and friction points this paper is trying to address.

LLM routing

cost-performance trade-off

user preferences

personalized optimization

implicit preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM routing

meta-learning

cost-performance trade-off

contextual bandit

personalized optimization

🔎 Similar Papers

RouteLLM: Learning to Route LLMs with Preference Data

2024-06-26arXiv.orgCitations: 33

MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs

2024-07-15arXiv.orgCitations: 6