Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
Current memory systems for large language model agents are predominantly designed for single scenarios, limiting their generalization across diverse tasks. This work systematically evaluates eight memory mechanisms alongside a search-oriented memory framework across five heterogeneous environments to assess their universality. To address these limitations, the paper introduces AutoMEM, a novel mechanism enabling agents to autonomously manage memory storage and retrieval. Departing from conventional fixed-pipeline passive storage, AutoMEM employs flat textual memory representations accessed via tool calls, empowering agents with active control over when and how to store or retrieve memories. Experimental results demonstrate that AutoMEM significantly outperforms baseline approaches in cross-task overall performance, underscoring the critical role of autonomous memory management in enhancing agent generalization.
📝 Abstract
LLM agents accumulate histories that outgrow their context windows, motivating a growing literature on memory systems. Yet most existing designs are tuned to a single scenario (multi-session chat or a single trajectory format), and there is little evidence that they generalize across the heterogeneous trajectories agents encounter in deployment. We revisit eight memory systems plus an agentic harness for search problems, on five scenarios: single-turn QA, multi-session chat, agentic-trajectory QA, memory stress tests, and long-horizon agentic tasks. The harness, which self-manages flat text-file storage via tool calls, achieves the best cross-task ranking, suggesting that memory performance hinges on giving the agent active control over storage and retrieval rather than on a passive store behind a fixed pipeline. We instantiate this insight in AutoMEM, an agentic memory harness with a self-managed tool interface that achieves the best cross-scenario generality among the systems we evaluate.
Problem

Research questions and friction points this paper is trying to address.

agentic memory
cross-scenario generality
memory systems
LLM agents
heterogeneous trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic memory
cross-scenario generalization
self-managed storage
tool-augmented LLM agents
memory system evaluation
🔎 Similar Papers
No similar papers found.