ProPlay: Procedural World Models for Self-Evolving LLM Agents

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In partially observable environments, agents struggle to self-evolve through unsupervised interaction due to limited active exploration capabilities, inefficient use of environmental feedback, and difficulty in assessing the reliability of past experiences. This work proposes ProPlay, a procedural world model that abstracts successful trajectories into composable programs and organizes them into a program graph, enabling, for the first time, closed-loop optimization between program-level mental simulation and environmental feedback. By incorporating reliability-aware memory embeddings and causal transition modeling, ProPlay facilitates continuous coordination between memory and planning modules. Evaluated on established benchmarks, ProPlay significantly outperforms strong baselines, demonstrating enhanced environmental understanding and self-evolution capabilities in autonomous agents.
📝 Abstract
Self-evolving agents are expected to improve through interaction without external supervision, but this remains difficult in partially observable environments where agents must explore actively, learn from limited feedback, and decide when to trust prior experience. Existing LLM-agent methods often rely on memory or planning modules, yet they rarely close the loop between them to continually refine an internal understanding of environment dynamics. We introduce ProPlay, a procedural world model that supports procedure-level preplay, where agents can rehearse future procedural paths using the learned world knowledge. Rather than representing experience as isolated rules or low-level action constraints, ProPlay abstracts successful trajectories into procedures and organizes them in a procedure graph that captures causal transitions among task stages. Each transition is associated with a reliability record embedding to estimate its task-specific contribution from past outcomes. Before each episode, ProPlay simulates future procedural trajectories over known graph structures as structured soft guidance; after execution, it refines the graph using environment feedback. Experiments on public benchmarks show that ProPlay consistently improves environment understanding and self-evolution capability over strong baselines. Our code has been released in https://github.com/antman9914/proplay.
Problem

Research questions and friction points this paper is trying to address.

self-evolving agents
partially observable environments
procedural world models
environment understanding
LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

procedural world model
self-evolving agents
procedure graph
procedure-level preplay
reliability record embedding
🔎 Similar Papers