Zero-Shot Reinforcement Learning Under Partial Observability

๐Ÿ“… 2025-06-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the generalization failure of zero-shot reinforcement learning (Zero-shot RL) in partially observable environments, systematically identifyingโ€” for the first timeโ€”the underlying performance degradation mechanism: standard approaches lack historical information modeling capacity, rendering them incapable of inferring latent states or adapting to unseen tasks when observations of states, rewards, and environment dynamics are all partial. To overcome this, we propose a memory-augmented zero-shot RL framework integrating (i) RNN- or Transformer-based memory encoders, (ii) reward-free pretraining, (iii) latent state inference, and (iv) dynamic change awareness modules. Evaluated across diverse partially observable benchmarks spanning multiple domains, our method significantly outperforms memoryless baselines and achieves robust cross-task zero-shot transfer. The implementation code and benchmark suite are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via: https://enjeeneer.io/projects/bfms-with-memory/.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot RL performance under partial observability
Memory-based architectures improve zero-shot RL
Handling partially observed states, rewards, dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-based architectures for partial observability
Zero-shot RL with partially observed states
Improved performance over memory-free baselines
๐Ÿ”Ž Similar Papers
No similar papers found.