🤖 AI Summary
This work addresses the inefficiency of static model allocation in graph-structured multi-agent systems, which often wastes computational resources on simple subtasks and struggles to balance cost and performance. To this end, we propose CASTER—a lightweight, context-aware routing strategy that dynamically assesses task difficulty by fusing semantic embeddings and graph-structural meta-features through a dual-signal router, thereby selecting models of appropriate capability for each subtask. CASTER employs a self-optimizing training paradigm that evolves from cold start to iterative refinement, leveraging LLM-as-a-Judge evaluation and a self-supervised negative feedback mechanism to continuously improve routing decisions. Experiments across software engineering, data analysis, scientific discovery, and cybersecurity demonstrate that CASTER achieves comparable success rates to full-capability model baselines while reducing inference costs by up to 72.4%, significantly outperforming heuristic routing and FrugalGPT.
📝 Abstract
Graph-based Multi-Agent Systems (MAS) enable complex cyclic workflows but suffer from inefficient static model allocation, where deploying strong models uniformly wastes computation on trivial sub-tasks. We propose CASTER (Context-Aware Strategy for Task Efficient Routing), a lightweight router for dynamic model selection in graph-based MAS. CASTER employs a Dual-Signal Router that combines semantic embeddings with structural meta-features to estimate task difficulty. During training, the router self-optimizes through a Cold Start to Iterative Evolution paradigm, learning from its own routing failures via on-policy negative feedback. Experiments using LLM-as-a-Judge evaluation across Software Engineering, Data Analysis, Scientific Discovery, and Cybersecurity demonstrate that CASTER reduces inference cost by up to 72.4% compared to strong-model baselines while matching their success rates, and consistently outperforms both heuristic routing and FrugalGPT across all domains.