🤖 AI Summary
This work addresses the challenge of long-term language agents struggling to efficiently retain critical memories under limited context windows, which entails a complex trade-off in long-horizon resource allocation under observational constraints. The authors formulate memory retention as a constrained stochastic optimization problem that jointly accounts for memory budget, evidential utility, and latency costs. They propose the OSL-MR framework, which introduces, for the first time, a memory retention learning mechanism subject to observability-aware safety constraints, and employs an online-offline supervised disentanglement architecture to learn the value of evidence conditioned on queries. Additionally, a Mixed-Score heuristic is integrated as a deployable baseline and a structured inductive prior. Experiments demonstrate that the approach significantly outperforms existing strategies on the LOCOMO and LongMemEval benchmarks, achieving high precision and recall under tight memory budgets and exhibiting robustness across diverse cost configurations.
📝 Abstract
Long-horizon language agents accumulate observations, reasoning traces, and retrieved facts that exceed their finite context windows, making memory retention a fundamental resource-allocation problem. Existing memory systems improve management through heuristic scoring, retrieval optimization, or learned compression, but largely treat retention as a local decision problem and do not explicitly model its long-term consequences under realistic observability constraints. To fill this gap, we formulate memory retention as a constrained stochastic optimization problem with explicit budget feasibility, evidence utility, and delayed costs including miss penalties, reacquisition delays, and stale-information risk. We then propose OSL-MR (Observability-Safe Learning for Memory Retention), a novel framework that enforces a strict separation between online-observable features and offline-available supervision (OAS). OSL-MR combines an evidence learner trained from realized evidence supervision with a Mixed-Score heuristic that serves both as a deployable online-safe baseline and as a structured inductive prior for learning. The resulting policy learns query-conditioned evidence value directly from interaction data while remaining deployable under the same observability constraints. Experiments on LOCOMO and LongMemEval show that OSL-MR consistently outperforms recency-based methods, Generative Agents-style scoring, and other heuristic baselines, particularly under tight memory budgets. The Mixed-Score prior further improves precision while preserving recall, and sensitivity analysis demonstrates robustness across a wide range of cost configurations.