Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the question of why linear recurrent memory units are effective in partially observable reinforcement learning by proposing two classes of linear filters that offer theoretical justification. The first class exactly reproduces the pre-softmax logits of the belief state in a hidden Markov model (HMM), while the second achieves near-zero state decoding error under nearly deterministic transitions. This study establishes the first theoretical foundation for the empirical success of linear recurrent memory in this setting, proving that such representations constitute sufficient statistics for learning optimal policies. The framework is further extended to action-conditioned HMMs. Empirical evaluations confirm the efficacy of the proposed filters and demonstrate their strong feature extraction capabilities in small-scale reinforcement learning environments.

📝 Abstract

The family of linear recurrent neural networks has shown strong performance as recurrent memory units in partially observable reinforcement learning. We provide a theoretical justification for their empirical effectiveness by constructing and studying two linear filters: (i) the first exactly reproduces the pre-softmax logits of the belief vector in a hidden Markov model (HMM) under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, (ii) the second achieves vanishing state-decoding error under a nearly deterministic transition matrix, thus reducing state ambiguity to near zero. The results extend to action-controlled HMMs, where the corresponding linear filters become time-varying with action-dependent dynamics. We illustrate our main results through numerical experiments and further show that the constructed linear filter serves as a strong feature extractor in a small reinforcement learning game.

Problem

Research questions and friction points this paper is trying to address.

partially observable reinforcement learning

linear recurrent memory

hidden Markov model

state ambiguity

sufficient statistic

Innovation

Methods, ideas, or system contributions that make the work stand out.

linear recurrent memory

partially observable reinforcement learning

hidden Markov model