CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling large language model (LLM)-based agents to undergo continuous, autonomous evolution without external supervision. We propose a multi-agent co-evolution framework grounded in interactive reward generation: rather than relying on human annotations or environmental feedback, agents iteratively engage in collaborative reasoning, with LLMs acting as decentralized evaluators that dynamically produce intrinsic reward signals; these signals drive policy optimization via reinforcement learning. Our key contribution is the first self-evolution paradigm that requires no external supervision and is intrinsically motivated by social interaction among agents. Experiments demonstrate state-of-the-art performance across multiple benchmarks—significantly surpassing zero-shot baselines—and reveal strong scalability: agent capability improves consistently with increases in both agent count and behavioral diversity.

Technology Category

Application Category

📝 Abstract
Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents.
Problem

Research questions and friction points this paper is trying to address.

Enabling autonomous agent improvement through mutual interaction learning
Generating intrinsic rewards from multi-agent discussion dynamics
Optimizing agent policies via reinforcement learning without external supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates intrinsic rewards from agent discussion dynamics
Uses LLM-as-a-judge mechanism to formulate interaction rewards
Optimizes agent policies through decentralized RL training
🔎 Similar Papers
No similar papers found.