🤖 AI Summary
This work addresses the challenge that existing large language model (LLM) agents struggle to retain and reuse effective cross-task strategies in multi-step interactive environments, often resorting to learning from scratch repeatedly. To overcome this limitation, the authors propose the Unified Context Evolution (UCE) framework, which, for the first time, structures agent experience into four evolvable context units: memory, strategy, workflow, and skill. UCE enables continual knowledge accumulation, refinement, and transfer without gradient updates through a generate-retrieve-score-prune pipeline driven by usage effectiveness. Integrated with an externalized memory bank and an adaptive scheduling module, UCE significantly improves performance—raising success rates on ALFWorld from 75.4% to 96.3% and WebShop scores from 45.1% to 61.3%. Moreover, the constructed knowledge base can be directly transferred across different backbone models without retraining.
📝 Abstract
LLM-based agents can solve multi-step interactive tasks by combining reasoning with environment feedback, yet each episode starts from the same fixed context and any useful strategy discovered along the way is lost once the task ends. Existing approaches either limit learning to the current task or pool all experience into a single untyped store, without distinguishing knowledge types, tracking quality through use, or balancing what the library still lacks. We introduce Unified Context Evolution (UCE), a gradient-free framework that externalizes agent experience into an evolving library of typed Evolvable Context Units (ECUs). UCE decomposes experience into four complementary types (Memory, Strategy, Workflow, and Skill), each generated from trajectories under type-specific conditions, retrieved at decision time, scored through repeated usage outcomes, and pruned when no longer valuable. A scheduling module allocates each cycle's generation budget toward the types where the library is weakest. Across two interactive benchmarks, UCE raises ALFWorld success from 75.4% to 96.3% and WebShop task score from 45.1% to 61.3%, and the accumulated library transfers to alternative actor backbones without retraining.