An Empirical Study of Deep Reinforcement Learning in Continuing Tasks

📅 2025-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses performance degradation of deep reinforcement learning (DRL) in continual tasks—non-episodic, inseparable, and infinitely extended interactions. To enable systematic evaluation, we introduce the first benchmark suite for continual learning based on MuJoCo and Atari environments, assessing canonical algorithms including DQN, SAC, and PPO. Through large-scale empirical analysis, we identify their typical failure modes in such settings. We then propose reward centering—a generalization of the classical technique within the temporal-difference (TD) framework—extending its applicability beyond Q-learning and small-scale discrete domains to high-dimensional continuous control and complex Atari tasks. Experiments demonstrate that reward centering significantly improves training stability and final performance across multiple TD-based algorithms, outperforming two baseline centering strategies. Our work establishes a new paradigm for enhancing DRL robustness in continual tasks, bridging a critical gap between theoretical TD principles and practical deployment in non-episodic settings.

Technology Category

Application Category

📝 Abstract
In reinforcement learning (RL), continuing tasks refer to tasks where the agent-environment interaction is ongoing and can not be broken down into episodes. These tasks are suitable when environment resets are unavailable, agent-controlled, or predefined but where all rewards-including those beyond resets-are critical. These scenarios frequently occur in real-world applications and can not be modeled by episodic tasks. While modern deep RL algorithms have been extensively studied and well understood in episodic tasks, their behavior in continuing tasks remains underexplored. To address this gap, we provide an empirical study of several well-known deep RL algorithms using a suite of continuing task testbeds based on Mujoco and Atari environments, highlighting several key insights concerning continuing tasks. Using these testbeds, we also investigate the effectiveness of a method for improving temporal-difference-based RL algorithms in continuing tasks by centering rewards, as introduced by Naik et al. (2024). While their work primarily focused on this method in conjunction with Q-learning, our results extend their findings by demonstrating that this method is effective across a broader range of algorithms, scales to larger tasks, and outperforms two other reward-centering approaches.
Problem

Research questions and friction points this paper is trying to address.

Continuous Tasks
Deep Reinforcement Learning
Reward Shaping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous Tasks
Reward Adjustment
Deep Reinforcement Learning
🔎 Similar Papers
No similar papers found.