Zero-Shot Reinforcement Learning Under Partial Observability

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This work addresses the generalization failure of zero-shot reinforcement learning (Zero-shot RL) in partially observable environments, systematically identifying— for the first time—the underlying performance degradation mechanism: standard approaches lack historical information modeling capacity, rendering them incapable of inferring latent states or adapting to unseen tasks when observations of states, rewards, and environment dynamics are all partial. To overcome this, we propose a memory-augmented zero-shot RL framework integrating (i) RNN- or Transformer-based memory encoders, (ii) reward-free pretraining, (iii) latent state inference, and (iv) dynamic change awareness modules. Evaluated across diverse partially observable benchmarks spanning multiple domains, our method significantly outperforms memoryless baselines and achieves robust cross-task zero-shot transfer. The implementation code and benchmark suite are publicly released.

Technology Category

Application Category

📝 Abstract

Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via: https://enjeeneer.io/projects/bfms-with-memory/.

Problem

Research questions and friction points this paper is trying to address.

Zero-shot RL performance under partial observability

Memory-based architectures improve zero-shot RL

Handling partially observed states, rewards, dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-based architectures for partial observability

Zero-shot RL with partially observed states

Improved performance over memory-free baselines

🔎 Similar Papers

Zero-Shot Generalization of Vision-Based RL Without Data Augmentation