ElasticMem: Latent Memory as a Learnable Resource for LLM Agents

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the limitations of existing long-term memory mechanisms in large language model agents, which typically treat memory as a static resource, leading to high contextual overhead, sensitivity to noise, and inflexible capacity. To overcome these issues, the authors propose ElasticMem, a novel framework that models memory as an elastic, learnable latent resource. ElasticMem constructs an offline latent memory bank, adaptively retrieves relevant memories based on the reasoner’s hidden states, and dynamically allocates memory budgets through a variable latent budgeting strategy, injecting soft memory tokens into the generation process. The entire system is jointly trained via population-based relative policy optimization. Evaluated on MemorySuite, ElasticMem achieves substantial performance gains—improving weighted accuracy by 24.6%–26.2% on question-answering tasks and success rates by 27.2%–66.3% on ALFWorld—while using the fewest tokens among compared methods.

📝 Abstract

Long-term memory is essential for LLM agents to reason coherently across extended interactions, personalize responses, and reuse past experience. However, existing memory-augmented methods typically treat memory as a fixed resource: text-space approaches concatenate retrieved memories into the context window, causing substantial token overhead and sensitivity to noisy evidence, while latent-space approaches reduce textual cost but still rely on rigid retrieval or fixed-capacity memory interfaces. This creates a mismatch between query-dependent memory utility and fixed memory allocation. We propose ElasticMem, a memory-augmented LLM framework that learns to use memory as an elastic latent resource. ElasticMem builds an offline latent memory bank with retrieval keys and content caches, retrieves memories adaptively from the reasoner's hidden state, assigns each retrieved memory a variable latent budget through a learned policy, and injects selected latent states as soft memory tokens for generation. The full memory-use process is optimized with downstream task rewards through group-relative policy optimization. We evaluate ElasticMem on MemorySuite, covering memory-intensive QA and embodied agent control. Across Qwen2.5-3B-Instruct and Qwen2.5-7B-Instruct backbones, ElasticMem improves weighted average QA accuracy by 26.2% and 24.6%, and improves ALFWorld success rate by 66.3% and 27.2%, respectively, over the strongest baselines, while achieving the lowest ALFWorld token cost. Ablations and qualitative analyses further show that adaptive retrieval and elastic budget allocation help ElasticMem prioritize useful evidence and transferable plans beyond rigid cosine similarity. Our code for ElasticMem will be released at https://github.com/ulab-uiuc/ElasticMem.

Problem

Research questions and friction points this paper is trying to address.

long-term memory

memory-augmented LLM

elastic memory

adaptive retrieval

latent memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

elastic memory

latent memory

adaptive retrieval