Agents of Change: Self-Evolving LLM Agents for Strategic Planning

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM) agents struggle with sustained strategic optimization in long-horizon planning tasks. Method: This paper introduces a multi-role collaborative self-evolution architecture grounded in the board game *Settlers of Catan*, implemented via the Catanatron framework. It employs four specialized LLM roles—Analyzer, Researcher, Coder, and Player—to enable end-to-end autonomous improvement, including failure diagnosis, strategy generation, and prompt/code rewriting. Unlike static prompt engineering, this approach establishes the first fully LLM-driven, closed-loop self-evolution system. Contribution/Results: Evaluated using Claude 3.7 and GPT-4o across multiple game episodes, the agent dynamically refines strategies, propagates high-performing behavioral samples, and incrementally enhances adaptive reasoning. It consistently outperforms handcrafted baselines, demonstrating that LLM agents possess trainable, strategy-level self-evolution capabilities.

Technology Category

Application Category

📝 Abstract
Recent advances in LLMs have enabled their use as autonomous agents across a range of tasks, yet they continue to struggle with formulating and adhering to coherent long-term strategies. In this paper, we investigate whether LLM agents can self-improve when placed in environments that explicitly challenge their strategic planning abilities. Using the board game Settlers of Catan, accessed through the open-source Catanatron framework, we benchmark a progression of LLM-based agents, from a simple game-playing agent to systems capable of autonomously rewriting their own prompts and their player agent's code. We introduce a multi-agent architecture in which specialized roles (Analyzer, Researcher, Coder, and Player) collaborate to iteratively analyze gameplay, research new strategies, and modify the agent's logic or prompt. By comparing manually crafted agents to those evolved entirely by LLMs, we evaluate how effectively these systems can diagnose failure and adapt over time. Our results show that self-evolving agents, particularly when powered by models like Claude 3.7 and GPT-4o, outperform static baselines by autonomously adopting their strategies, passing along sample behavior to game-playing agents, and demonstrating adaptive reasoning over multiple iterations.
Problem

Research questions and friction points this paper is trying to address.

Can LLM agents self-improve in strategic planning tasks
Evaluating self-evolving agents in board game environments
Comparing manual vs LLM-evolved agent performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent architecture for strategic planning
Self-evolving agents rewrite prompts and code
Benchmarking with Settlers of Catan game
🔎 Similar Papers
No similar papers found.