🤖 AI Summary
Existing memory systems based on semantic similarity struggle to maintain execution state consistency in long-horizon tasks, often leading to fragmented decision-making and error propagation. This work proposes MAGE, a novel framework that conceptualizes memory as an execution state manager. MAGE organizes interaction history into a hierarchical state tree, where the path from root to the current node defines the agent’s state, and integrates subgoal summaries, recent trajectories, and historical branch hints for decision-making. The framework introduces four coordinated operations—Grow, Compress, Maintain, and Revise—to dynamically manage the state tree, enabling effective error isolation and efficient contextual utilization. Evaluated on the MemoryArena benchmark, MAGE improves task success rates by 7.8–20.4 percentage points while reducing token consumption by 55.1%.
📝 Abstract
LLM-based agents increasingly tackle long-horizon tasks with interdependent decisions, where each action reshapes future constraints and intermediate errors can cascade. Existing RAG and agent memory systems organize histories by semantic similarity, retrieving content-relevant entries at decision time. We argue that this design mismatches execution-state dependencies: it fragments decision trajectories and mixes valid and erroneous traces, hindering coherent state reconstruction and error isolation. We propose MAGE (Memory as Agent-Guided Exploration), an active execution-state manager that stores interactions in a hierarchical state tree. The agent derives its state from the active root-to-current path, combining subgoal summaries, recent traces, and hints from prior branches. Four coupled operations maintain the tree: Grow records new traces, Compress summarizes completed subgoals, Maintain validates summaries, and Revise restores a target boundary and resumes on a new branch. This design bounds context growth while preserving state integrity and isolating flawed segments from the active path. Experiments on MemoryArena show that MAGE improves the average task success rate by 7.8--20.4 pp over baselines, while reducing token consumption by 55.1%.