🤖 AI Summary
Existing agent memory systems rely on fixed memory construction–retrieval pipelines, which struggle to adapt to task diversity and the dynamic evolution of memory stores, thereby limiting performance gains. This work proposes a novel framework that treats the entire memory pipeline as an evolvable program, enabling agent-driven iterative refinement. By maintaining an executable version tree and employing a failure-mode-guided edit-and-debug mechanism, the approach continuously diagnoses weaknesses and generates improved pipeline variants. Moving beyond prior methods that optimize only the memory store or prompt templates in isolation, this framework achieves co-adaptive evolution of both pipeline structure and memory content. Evaluated on LongMemEval, LoCoMo, HotpotQA, and NarrativeQA, the method consistently outperforms static and prompt-level evolutionary baselines within just a few iterations, demonstrating both high performance and strong cost efficiency.
📝 Abstract
Long-horizon autonomous agents require memory systems to retain historical information, track evolving states, and reuse relevant knowledge beyond finite context windows. Existing agentic memory systems typically follow a memory construction-retrieval (MCR) pipeline, but often adapt mainly the memory bank while keeping the surrounding pipeline fixed after deployment. This fixed-pipeline design struggles to handle heterogeneous task-specific failure modes and can become misaligned with memory banks that evolve in scale and structure over time. To address these limitations, we propose MemPro, a system-level evolution framework that treats the entire MCR pipeline as an evolvable program rather than adapting only the memory bank or prompt text. MemPro maintains a version tree of runnable memory-system implementations, where an Evolving Agent iteratively selects promising versions, diagnoses recurring failures, and creates improved child versions through failure-mode-guided edit-debug refinement. Experiments on LongMemEval, LoCoMo, HotpotQA, and NarrativeQA show that MemPro consistently outperforms strong static and prompt-level evolving baselines within a few iterations, continues to improve with evolution, and achieves a favorable performance-cost trade-off. Code is available at https://github.com/wanghai673/MemPro.