EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge that long-horizon agents struggle to retain evidence critical for future queries under strict memory budgets, leading to inefficient retrieval. To overcome this, the authors propose a pre-query, budget-constrained evidence retention mechanism that constructs compact, traceable “evidence capsules” through a learnable memory writing strategy. Each capsule integrates verbatim text excerpts, retrieval keys, and update metadata, and the entire writing process is end-to-end optimized using post-query feedback. Evaluated on the LongMemEval-RR benchmark, the proposed method, EMBER-14B, achieves an F1 score of 0.3017 under an 8,192-token memory budget, substantially outperforming the strongest baseline (F1 = 0.1765), thereby demonstrating its superiority in preserving salient information and enhancing retrieval effectiveness.

📝 Abstract

Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before the query is known, which source evidence should be retained so that it remains recoverable and usable under a fixed retained source-evidence token budget? We instantiate this setting as Budgeted Pre-Query Retention, where memory is written during ingestion and later read without access to the full raw stream. We introduce EMBER, a learned retention policy that constructs a compact, source-backed evidence state. EMBER stores evidence capsules: verbatim source excerpts paired with retrieval keys and update metadata, preserving both grounding and read-time access. Post-query outcome feedback trains the writer to preserve evidence across the ingestion-retrieval-answer chain. On LongMemEval-RR, our LongMemEval-derived retained-evidence protocol, EMBER-14B reaches 0.3017 F1 at the 8192-token retained-evidence comparison point, compared with 0.1765 for the strongest non-EMBER budgeted baseline. Across retained source-evidence budgets, EMBER improves F1, Retain-Recall, and Read-Recall, indicating that long-horizon memory depends on retaining evidence within the budget rather than rereading larger histories.

Problem

Research questions and friction points this paper is trying to address.

long-horizon agents

budgeted evidence retention

memory efficiency

evidence survival

source-evidence token budget

Innovation

Methods, ideas, or system contributions that make the work stand out.

Budgeted Evidence Retention

Evidence Capsules

Long-Horizon Memory