AdaMEM: Test-Time Adaptive Memory for Language Agents

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing language agents struggle to dynamically adapt to environmental changes during long-horizon tasks, as their memory mechanisms typically retrieve information only at the initial stage, leading to suboptimal subsequent decisions. This work proposes AdaMEM, a novel framework featuring a hybrid memory architecture that integrates offline-constructed long-term trajectory memory with online-generated short-term policy memory, enabling inference-time behavioral adaptation without updating model parameters. Through Step-wise Memory Fine-Tuning (STEP-MFT), the agent synthesizes high-quality decisions from retrieved experiences while flexibly balancing computational overhead and adaptability. Experiments demonstrate that AdaMEM achieves up to 13% and 11% relative performance gains on ALFWorld and WebShop, respectively, and maintains state-of-the-art results on the HotpotQA agent search task.

📝 Abstract

A central challenge for language agents is utilizing past experience to adapt to dynamic test-time conditions. While recent work demonstrates the promise of agentic memory mechanisms, most systems restrict retrieval to episode initiation. Consequently, agents are forced to rely on static guidance that becomes increasingly misaligned as long-horizon tasks unfold. To address this rigidity, we propose the Adaptive Memory Agent (AdaMEM), a novel framework for agent test-time adaptation. Without updating model parameters online, AdaMEM adapts agent behavior via a hybrid memory architecture: it maintains a long-term trajectory memory of raw experiences collected offline while generating dynamic short-term strategy memory on-the-fly to guide decision-making. This mechanism enables the trade-off between token efficiency and adaptability across varying inference-time compute levels. Empirically, AdaMEM significantly outperforms static memory baselines, achieving relative gains of up to 13% on ALFWorld and 11% on WebShop, with consistent leading performance extending to agentic search on HotpotQA. To further enhance this adaptation, we develop STEP-MFT, a Step-wise Memory Fine-Tuning technique that trains the policy to synthesize high-quality strategies from retrieved experiences, yielding additional performance gains. Our work establishes a new scaling dimension for agentic memory, supporting continuous reasoning and self-evolution post-deployment in real-world environments. Our code is available at https://github.com/yunx-z/AdaMEM.

Problem

Research questions and friction points this paper is trying to address.

language agents

test-time adaptation

agentic memory

dynamic environments

long-horizon tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation

hybrid memory architecture

dynamic strategy memory