AgentBalance: Backbone-then-Topology Design for Cost-Effective Multi-Agent Systems under Budget Constraints

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

In budget-constrained large language model (LLM)-based multi-agent systems (MAS), jointly optimizing token cost and end-to-end latency remains challenging. Method: This paper proposes a novel “backbone-first, then topology-optimization” paradigm, introducing the first unified formulation that jointly models token consumption and end-to-end latency budgets. It enables co-optimization of heterogeneous LLM backbone selection, role-backbone alignment, agent representation learning, gated communication mechanisms, and latency-aware topology synthesis. Results: Evaluated across 14 candidate LLMs, our method achieves up to 10% higher task performance under fixed token budgets and up to 22% improvement under fixed latency budgets, with significantly better AUC. The approach supports plug-and-play deployment and cross-LLM generalization, providing a scalable, cost-effective design framework for large-scale MAS.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications such as web search, social network analytics, and online customer support, where cost-effectiveness is increasingly the primary constraint for large-scale deployment. While recent work improves MAS cost-effectiveness by shaping inter-agent communication topologies and selecting agent backbones, it rarely models and optimizes under explicit token-cost and latency budgets that reflect deployment constraints. This often leads to topology-first designs and suboptimal cost-effectiveness when budgets are binding. We present AgentBalance, a framework for constructing cost-effective MAS under explicit token-cost and latency budgets via a backbone-then-topology design. AgentBalance first performs backbone-oriented agent generation, constructing agents with heterogeneous backbones through LLM pool construction, pool selection, and role-backbone matching. It then performs adaptive MAS topology generation, guiding inter-agent communication via agent representation learning, gating, and latency-aware topology synthesis. Experiments on benchmarks with 14 candidate LLM backbones show that AgentBalance achieves up to 10% and 22% performance gains under matched token-cost and latency budgets, respectively, and yields strong AUC on performance-versus-budget curves across benchmarks. AgentBalance also functions as a plug-in for existing MAS, improving performance under the same token-cost and latency constraints, and it generalizes well to unseen LLMs for practical, budget-aware deployment. Code: https://github.com/usail-hkust/AgentBalance

Problem

Research questions and friction points this paper is trying to address.

Optimizes multi-agent systems under explicit token-cost and latency budgets.

Designs cost-effective agent backbones before communication topologies.

Improves performance in budget-constrained large-scale web applications.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Backbone-then-topology design for budget-constrained multi-agent systems

Heterogeneous agent generation via LLM pool selection and matching

Adaptive topology synthesis using representation learning and gating

🔎 Similar Papers

No similar papers found.