Scaling Self-Evolving Agents via Parametric Memory

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing memory-augmented large language model agents store experiences solely within prompts, preventing genuine learning and leading to rigid policies with irreversible forgetting. This work proposes TMEM, a novel framework that integrates parametric memory with online LoRA fine-tuning, enabling agents to distill task-specific experiences into updatable LoRA parameters Δₜ through lightweight distillation within a single episode. Memory retrieval is formulated as an end-to-end optimizable reinforcement learning policy. Key innovations include SVD-based subspace initialization for accelerated convergence, coupled with online distillation and fast weight mechanisms. Experiments demonstrate that TMEM significantly outperforms summary- or retrieval-based memory approaches across multiple benchmarks—including LoCoMo, LongMemEval-S, multi-target search, and CL-Bench—and remains effective across varying model scales.

📝 Abstract

Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but cannot \emph{learn from} it: their policy is unchanged by experience, and any information dropped from the context is permanently lost. We introduce \texttt{TMEM}, a self-evolving parametric memory framework in which the agent not only compresses history into explicit memory but also absorbs distilled supervision into fast LoRA weights $Δ_t$ via lightweight online updates, genuinely altering its future behavior within a single episode. We formalize this as an agentic decision process with fast-weight rollout dynamics: actions are sampled from $π_{θ_0+Δ_t}$, while extraction actions produce supervision that updates $Δ_t$ for subsequent decisions. This view makes the extraction policy directly optimizable by RL: training $θ_0$ improves not only task actions but also the quality of the data used for online LoRA adaptation. We further propose SVD-based initialization of the LoRA subspace to accelerate online convergence. Experiments on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench show that \texttt{TMEM} consistently outperforms summary-based and retrieval-based baselines across different model scales.

Problem

Research questions and friction points this paper is trying to address.

memory-augmented agents

parametric memory

online learning

experience forgetting

frozen parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

parametric memory

self-evolving agents

online LoRA adaptation