X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-driven multi-agent systems (MAS) predominantly adopt a homogeneous architecture—employing a single LLM to power all agents—thereby limiting collective intelligence. This paper proposes X-MAS, a heterogeneous LLM-driven MAS, which dynamically assigns specialized LLMs to agents based on their functional roles, establishing and empirically validating this paradigm for the first time. Our contributions are threefold: (1) a domain-function coupling analysis framework and a heterogeneous collaborative scheduling mechanism; (2) X-MAS-Bench, the first fine-grained MAS-specific evaluation benchmark; and (3) large-scale empirical validation across over 1.7 million test instances, demonstrating performance gains of +8.4% on MATH and +47% on AIME—achieving substantial efficiency improvements without modifying the underlying system architecture.

Technology Category

Application Category

📝 Abstract
LLM-based multi-agent systems (MAS) extend the capabilities of single LLMs by enabling cooperation among multiple specialized agents. However, most existing MAS frameworks rely on a single LLM to drive all agents, constraining the system's intelligence to the limit of that model. This paper explores the paradigm of heterogeneous LLM-driven MAS (X-MAS), where agents are powered by diverse LLMs, elevating the system's potential to the collective intelligence of diverse LLMs. We introduce X-MAS-Bench, a comprehensive testbed designed to evaluate the performance of various LLMs across different domains and MAS-related functions. As an extensive empirical study, we assess 27 LLMs across 5 domains (encompassing 21 test sets) and 5 functions, conducting over 1.7 million evaluations to identify optimal model selections for each domain-function combination. Building on these findings, we demonstrate that transitioning from homogeneous to heterogeneous LLM-driven MAS can significantly enhance system performance without requiring structural redesign. Specifically, in a chatbot-only MAS scenario, the heterogeneous configuration yields up to 8.4% performance improvement on the MATH dataset. In a mixed chatbot-reasoner scenario, the heterogeneous MAS could achieve a remarkable 47% performance boost on the AIME dataset. Our results underscore the transformative potential of heterogeneous LLMs in MAS, highlighting a promising avenue for advancing scalable, collaborative AI systems.
Problem

Research questions and friction points this paper is trying to address.

Enabling multi-agent systems with diverse LLMs for collective intelligence
Evaluating optimal LLM combinations across domains and functions
Demonstrating performance gains in heterogeneous vs homogeneous MAS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous LLMs enhance multi-agent system intelligence
X-MAS-Bench evaluates diverse LLMs across domains
Heterogeneous MAS boosts performance without structural changes
🔎 Similar Papers
No similar papers found.