Joint Agent Memory and Exploration Learning via Novelty Signals

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

In open-ended environments, agents face significant challenges including inefficient exploration, high storage costs for interaction histories, and the absence of reliable supervision for latent memory. To address these issues, this work proposes JAMEL, a novel framework that jointly optimizes memory modeling and exploration policy for the first time. JAMEL compresses historical interactions into latent memory and leverages unsupervised, deterministic novelty signals—such as GUI code coverage—as natural supervision to guide the agent in distinguishing between explored and unseen behaviors. This approach establishes a synergistic loop where memory and exploration mutually reinforce each other, enabling strong generalization in unseen environments. Empirical results demonstrate that JAMEL achieves exploration depth surpassing existing open-source baselines and rivaling closed-source models, while substantially reducing token consumption.

📝 Abstract

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce \textbf{J}oint \textbf{A}gent \textbf{M}emory and \textbf{E}xploration \textbf{L}earning (\textbf{JAMEL}), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.

Problem

Research questions and friction points this paper is trying to address.

exploration

agent memory

novelty signals

open-ended environments

supervisory signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

joint memory-exploration learning

novelty-driven exploration

latent memory