🤖 AI Summary
This work addresses the challenge of effectively determining when to access memory and which historical utterances to retrieve in long-context dialogue systems, particularly under dynamically shifting and inconsistent user preferences. The authors propose a Bayesian factor–based metric to quantify memory utility, framing memory retrieval as an evaluation of the evidential strength that past dialogue turns provide about the user’s latent preference state. This unified approach enables adaptive modeling of dynamic preferences and precise control over memory usage. Empirical results demonstrate that the method significantly outperforms embedding-based retrieval approaches across four heterogeneous memory benchmarks, achieving especially strong performance in preference-intensive long-context tasks while remaining competitive even in low-preference-density scenarios.
📝 Abstract
Long-context dialogue systems must decide both when to access memory and which parts of the interaction history are relevant. Existing approaches typically rely on heuristic retrieval signals or always-on memory usage, failing to account for the changing and potentially inconsistent nature of user preferences. In this work, we propose a unified framework for memory access and selection based on changing preferences. We formulate personalized memory retrieval as identifying which historical turns provide evidence about a user's latent preference state, rather than relying on surface-level semantic similarity. To this end, we quantify the utility of each memory turn using a Bayes factor, defined as the improvement in the model's likelihood of the reference response when the turn is included in context. This provides a principled measure of evidence strength and a unified signal for both memory access and selection. By framing memory retrieval as utility estimation, the model learns to identify salient turns and regulate memory usage based on expected utility. Experiments on four heterogeneous memory benchmarks show that our approach outperforms existing embedding-based retrieval on long-context, preference-intensive tasks where modeling changing preferences is essential, while remaining competitive in low-density regimes where semantic similarity suffices.