AgentBalance: Backbone-then-Topology Design for Cost-Effective Multi-Agent Systems under Budget Constraints

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In budget-constrained large language model (LLM)-based multi-agent systems (MAS), jointly optimizing token cost and end-to-end latency remains challenging. Method: This paper proposes a novel “backbone-first, then topology-optimization” paradigm, introducing the first unified formulation that jointly models token consumption and end-to-end latency budgets. It enables co-optimization of heterogeneous LLM backbone selection, role-backbone alignment, agent representation learning, gated communication mechanisms, and latency-aware topology synthesis. Results: Evaluated across 14 candidate LLMs, our method achieves up to 10% higher task performance under fixed token budgets and up to 22% improvement under fixed latency budgets, with significantly better AUC. The approach supports plug-and-play deployment and cross-LLM generalization, providing a scalable, cost-effective design framework for large-scale MAS.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications such as web search, social network analytics, and online customer support, where cost-effectiveness is increasingly the primary constraint for large-scale deployment. While recent work improves MAS cost-effectiveness by shaping inter-agent communication topologies and selecting agent backbones, it rarely models and optimizes under explicit token-cost and latency budgets that reflect deployment constraints. This often leads to topology-first designs and suboptimal cost-effectiveness when budgets are binding. We present AgentBalance, a framework for constructing cost-effective MAS under explicit token-cost and latency budgets via a backbone-then-topology design. AgentBalance first performs backbone-oriented agent generation, constructing agents with heterogeneous backbones through LLM pool construction, pool selection, and role-backbone matching. It then performs adaptive MAS topology generation, guiding inter-agent communication via agent representation learning, gating, and latency-aware topology synthesis. Experiments on benchmarks with 14 candidate LLM backbones show that AgentBalance achieves up to 10% and 22% performance gains under matched token-cost and latency budgets, respectively, and yields strong AUC on performance-versus-budget curves across benchmarks. AgentBalance also functions as a plug-in for existing MAS, improving performance under the same token-cost and latency constraints, and it generalizes well to unseen LLMs for practical, budget-aware deployment. Code: https://github.com/usail-hkust/AgentBalance
Problem

Research questions and friction points this paper is trying to address.

Optimizes multi-agent systems under explicit token-cost and latency budgets.
Designs cost-effective agent backbones before communication topologies.
Improves performance in budget-constrained large-scale web applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Backbone-then-topology design for budget-constrained multi-agent systems
Heterogeneous agent generation via LLM pool selection and matching
Adaptive topology synthesis using representation learning and gating
🔎 Similar Papers
No similar papers found.
Shuowei Cai
Shuowei Cai
HKUST(GZ)
Federated Learning.
Y
Yansong Ning
The Hong Kong University of Science and Technology (Guangzhou)
H
Hao Liu
The Hong Kong University of Science and Technology (Guangzhou)