🤖 AI Summary
This work addresses the challenge that long-horizon agents struggle to retain evidence critical for future queries under strict memory budgets, leading to inefficient retrieval. To overcome this, the authors propose a pre-query, budget-constrained evidence retention mechanism that constructs compact, traceable “evidence capsules” through a learnable memory writing strategy. Each capsule integrates verbatim text excerpts, retrieval keys, and update metadata, and the entire writing process is end-to-end optimized using post-query feedback. Evaluated on the LongMemEval-RR benchmark, the proposed method, EMBER-14B, achieves an F1 score of 0.3017 under an 8,192-token memory budget, substantially outperforming the strongest baseline (F1 = 0.1765), thereby demonstrating its superiority in preserving salient information and enhancing retrieval effectiveness.
📝 Abstract
Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before the query is known, which source evidence should be retained so that it remains recoverable and usable under a fixed retained source-evidence token budget? We instantiate this setting as Budgeted Pre-Query Retention, where memory is written during ingestion and later read without access to the full raw stream. We introduce EMBER, a learned retention policy that constructs a compact, source-backed evidence state. EMBER stores evidence capsules: verbatim source excerpts paired with retrieval keys and update metadata, preserving both grounding and read-time access. Post-query outcome feedback trains the writer to preserve evidence across the ingestion-retrieval-answer chain. On LongMemEval-RR, our LongMemEval-derived retained-evidence protocol, EMBER-14B reaches 0.3017 F1 at the 8192-token retained-evidence comparison point, compared with 0.1765 for the strongest non-EMBER budgeted baseline. Across retained source-evidence budgets, EMBER improves F1, Retain-Recall, and Read-Recall, indicating that long-horizon memory depends on retaining evidence within the budget rather than rereading larger histories.