π€ AI Summary
This work addresses the challenge of non-stationary reinforcement learning scenarios where task identifiers or environmental change signals are absent, a setting in which existing deep reinforcement learning methods often struggle to adapt effectively. To overcome this limitation, the paper introduces Space-sampled Value Decay (SVD), an explicit forgetting mechanism inspired by biological memory processes. SVD can be seamlessly integrated into value-based algorithms such as DQN and SAC, enabling dynamic adjustment of value functions without requiring prior knowledge of environmental shifts. Empirical results demonstrate that SVD significantly improves cumulative returns across multiple non-stationary environments. Notably, this study represents the first effort to incorporate biologically inspired forgetting into non-stationary reinforcement learning without task labels, achieving both strong performance and inherent adaptability.
π Abstract
Studies on rodents such as mice have shown the capabilities to adapt their behavior when dealing with changing parameters (``drift'') of the environment even if no information about change is provided (uncertainty) -- a behavior that can be modeled by forgetting mechanisms. Non-stationary Reinforcement Learning (NSRL) deals with adapting state-of-the-art RL methods to deal with changing environments: these however usually require (partially) perfect information about the drift such as ``task IDs'' or ``context''. To mitigate the effects of drift, this work develops \emph{Space-sampled Value Decay} as an explicit forgetting mechanism for value-based deep RL architectures as a simple yet effective approach. In particular we demonstrate and discuss positive effects but also limitations in achieved returns for modifications of Deep Q-networks (DQN) and Soft Actor-Critic (SAC) when evaluated on non-stationary environments.