MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing multi-agent systems (MAS) rely heavily on manually crafted rules, resulting in strong bias and limited autonomy. Method: This paper proposes the first fully automated, query-adaptive MAS graph-structure generation framework grounded in reinforcement learning (RL). It formalizes MAS construction as a graph-search problem, introduces the “component rationality” design principle, and develops Hierarchical Relative Policy Optimization (HRPO)—a novel RL algorithm that jointly models intra-group relative advantages and action-level rewards. The framework integrates RL, graph search, probabilistic sampling, and LLM-driven multi-agent coordination. Contribution/Results: Evaluated on six benchmark tasks, the framework consistently outperforms state-of-the-art baselines, achieving significant improvements in task accuracy, reasoning efficiency, and structural rationality of generated MAS topologies.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing systems largely remain within the paradigm of semi-autonomous patterns. In this work, we propose MasHost, a Reinforcement Learning (RL)-based framework for autonomous and query-adaptive Mas design. By formulating Mas construction as a graph search problem, our proposed MasHost jointly samples agent roles and their interactions through a unified probabilistic sampling mechanism. Beyond the accuracy and efficiency objectives pursued in prior works, we introduce component rationality as an additional and novel design principle in Mas. To achieve this multi-objective optimization, we propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively integrates group-relative advantages and action-wise rewards. To our knowledge, our proposed MasHost is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that MasHost consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.

Problem

Research questions and friction points this paper is trying to address.

Autonomous multi-agent system design without human biases

Optimizing agent roles and interactions via RL

Ensuring component rationality in multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-driven autonomous multi-agent system design

Unified probabilistic sampling for agent roles

Hierarchical Relative Policy Optimization strategy

🔎 Similar Papers

No similar papers found.