๐ค AI Summary
This work addresses the generalization failure of zero-shot reinforcement learning (Zero-shot RL) in partially observable environments, systematically identifyingโ for the first timeโthe underlying performance degradation mechanism: standard approaches lack historical information modeling capacity, rendering them incapable of inferring latent states or adapting to unseen tasks when observations of states, rewards, and environment dynamics are all partial. To overcome this, we propose a memory-augmented zero-shot RL framework integrating (i) RNN- or Transformer-based memory encoders, (ii) reward-free pretraining, (iii) latent state inference, and (iv) dynamic change awareness modules. Evaluated across diverse partially observable benchmarks spanning multiple domains, our method significantly outperforms memoryless baselines and achieves robust cross-task zero-shot transfer. The implementation code and benchmark suite are publicly released.
๐ Abstract
Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via: https://enjeeneer.io/projects/bfms-with-memory/.