Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

๐Ÿ“… 2026-06-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limited generalization of large language model (LLM) agents in complex tasks, which stems from inefficient interactive feedback and static training environments. The authors propose Role-Agent, a novel framework in which a single LLM simultaneously assumes both agent and environment roles. Through a โ€œWorld-In-Agentโ€ mechanism, process rewards are derived from state prediction alignment, while an โ€œAgent-In-Worldโ€ mechanism analyzes failure trajectories to dynamically resample training data. This dual-role co-evolution strategy enables continuous mutual refinement between agent behavior and environmental dynamics. Evaluated across multiple benchmarks, Role-Agent achieves an average performance gain exceeding 4% over strong baselines, demonstrating its effectiveness and superior generalization capability.
๐Ÿ“ Abstract
Although Large Language Model (LLM) agents have demonstrated strong performance on complex tasks, their learning is often limited by inefficient interaction feedback and static training environments, which hinder broader generalization. To address these limitations, this paper introduces Role-Agent, \textcolor{black}{a framework} that harnesses a single LLM to function concurrently as both the agent and the environment, enabling a bootstrapped co-evolution. Role-Agent comprises two synergistic components: World-In-Agent (WIA) and Agent-In-World (AIW). In WIA, the LLM acts as the agent and predicts future states after each action; the alignment between predicted and actual states is then used as a process reward, encouraging environment-aware reasoning. In AIW, the LLM analyzes failure modes from failed trajectories and retrieves tasks with similar failure patterns, thereby reshaping the training data distribution for targeted practice. Experiments on multiple benchmarks show that Role-Agent consistently improves performance, yielding an average gain of over 4\% over strong baselines.
Problem

Research questions and friction points this paper is trying to address.

LLM agents
interaction feedback
training environment
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Role-Agent
dual-role evolution
World-In-Agent
Agent-In-World
process reward
๐Ÿ”Ž Similar Papers