Adaptive Q-Network: On-the-fly Target Selection for Deep Reinforcement Learning

📅 2024-05-25
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) suffers from hyperparameter sensitivity, low sample efficiency, and training non-stationarity, hindering practical deployment; existing AutoRL approaches rely on auxiliary sampling and fail to account for RL’s intrinsic non-stationary dynamics. To address this, we propose Adaptive Q-Networks (AdaQN), the first AutoRL framework featuring an online, co-adaptive hyperparameter selection mechanism explicitly designed for RL non-stationarity. AdaQN employs parallel Q-function learning with dynamic target switching guided by minimal approximation error—enabling joint optimization of multiple hyperparameters without additional environment interactions. It is fully compatible with any critic-based DRL algorithm. We provide theoretical convergence guarantees under standard assumptions. Empirical evaluation on MuJoCo and Atari 2600 benchmarks demonstrates substantial improvements in sample efficiency, training stability, and robustness to random initialization, consistently outperforming state-of-the-art AutoRL methods.

Technology Category

Application Category

📝 Abstract
Deep Reinforcement Learning (RL) is well known for being highly sensitive to hyperparameters, requiring practitioners substantial efforts to optimize them for the problem at hand. This also limits the applicability of RL in real-world scenarios. In recent years, the field of automated Reinforcement Learning (AutoRL) has grown in popularity by trying to address this issue. However, these approaches typically hinge on additional samples to select well-performing hyperparameters, hindering sample-efficiency and practicality. Furthermore, most AutoRL methods are heavily based on already existing AutoML methods, which were originally developed neglecting the additional challenges inherent to RL due to its non-stationarities. In this work, we propose a new approach for AutoRL, called Adaptive $Q$-Network (AdaQN), that is tailored to RL to take into account the non-stationarity of the optimization procedure without requiring additional samples. AdaQN learns several $Q$-functions, each one trained with different hyperparameters, which are updated online using the $Q$-function with the smallest approximation error as a shared target. Our selection scheme simultaneously handles different hyperparameters while coping with the non-stationarity induced by the RL optimization procedure and being orthogonal to any critic-based RL algorithm. We demonstrate that AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600$ games, showing benefits in sample-efficiency, overall performance, robustness to stochasticity and training stability.
Problem

Research questions and friction points this paper is trying to address.

Addresses hyperparameter sensitivity in Deep Reinforcement Learning.
Proposes Adaptive Q-Network for efficient hyperparameter selection.
Enhances sample-efficiency and robustness in RL applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Q-Network for AutoRL
Online hyperparameter update without extra samples
Handles non-stationarity in RL optimization