🤖 AI Summary
This study addresses a critical limitation in existing reinforcement learning approaches: their excessive emphasis on memory retention at the expense of dynamically updating or overwriting memory contents in partially observable environments. The work explicitly focuses on memory rewriting capability for the first time, introducing a dedicated benchmark environment to evaluate this ability and systematically comparing recurrent neural networks, Transformers, and structured memory architectures in partially observable reinforcement learning settings. Experimental results demonstrate that classical recurrent models exhibit superior robustness and performance in memory rewriting tasks, whereas structured memory and Transformer-based approaches are only effective under specific conditions and tend to fail in more complex scenarios. These findings reveal fundamental limitations of current mainstream memory mechanisms and argue that future designs must balance stable retention with flexible forgetting.
📝 Abstract
Effective decision-making in the real world depends on memory that is both stable and adaptive: environments change over time, and agents must retain relevant information over long horizons while also updating or overwriting outdated content when circumstances shift. Existing Reinforcement Learning (RL) benchmarks and memory-augmented agents focus primarily on retention, leaving the equally critical ability of memory rewriting largely unexplored. To address this gap, we introduce a benchmark that explicitly tests continual memory updating under partial observability, i.e. the natural setting where an agent must rely on memory rather than current observations, and use it to compare recurrent, transformer-based, and structured memory architectures. Our experiments reveal that classic recurrent models, despite their simplicity, demonstrate greater flexibility and robustness in memory rewriting tasks than modern structured memories, which succeed only under narrow conditions, and transformer-based agents, which often fail beyond trivial retention cases. These findings expose a fundamental limitation of current approaches and emphasize the necessity of memory mechanisms that balance stable retention with adaptive updating. Our work highlights this overlooked challenge, introduces benchmarks to evaluate it, and offers insights for designing future RL agents with explicit and trainable forgetting mechanisms. Code: https://quartz-admirer.github.io/Memory-Rewriting/