PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning

📅 2023-06-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Hierarchical reinforcement learning (HRL) suffers from training instability due to non-stationarity induced by policy-level coupling across temporal abstractions. To address this, we propose a two-stage off-policy HRL framework: first, leveraging a small set of expert demonstrations, we design an original-action-driven adaptive subgoal relabeling mechanism to generate high-quality subgoal supervision signals; second, we jointly optimize the high-level (subgoal selection) and low-level (primitive action) policies. Our approach remains broadly applicable under weak task-structure assumptions and minimal expert data, with theoretical analysis establishing a suboptimality bound for subgoal relabeling. The framework is algorithm-agnostic and integrates seamlessly with mainstream off-policy RL algorithms—including SAC and TD3. Empirically, it achieves an 80% success rate on complex, sparse-reward robotic control tasks, substantially outperforming both hierarchical and end-to-end baselines. Its efficacy is further validated on a real-world robot platform.
📝 Abstract
Hierarchical reinforcement learning (HRL) has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration. However, hierarchical agents are difficult to train due to inherent non-stationarity. We present primitive enabled adaptive relabeling (PEAR), a two-phase approach where we first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision, and then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL). We perform theoretical analysis to bound the sub-optimality of our approach and derive a joint optimization framework using RL and IL. Since PEAR utilizes only a few expert demonstrations and considers minimal limiting assumptions on the task structure, it can be easily integrated with typical off-policy RL algorithms to produce a practical HRL approach. We perform extensive experiments on challenging environments and show that PEAR is able to outperform various hierarchical and non-hierarchical baselines and achieve upto $80%$ success rates in complex sparse robotic control tasks where other baselines typically fail to show significant progress. We also perform ablations to thoroughly analyse the importance of our various design choices. Finally, we perform real world robotic experiments on complex tasks and demonstrate that PEAR consistently outperforms the baselines.
Problem

Research questions and friction points this paper is trying to address.

Enhance hierarchical reinforcement learning efficiency
Overcome non-stationarity in training hierarchical agents
Improve success rates in complex robotic tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive relabeling for subgoal supervision
Joint optimization using RL and IL
Integration with off-policy RL algorithms
U
Utsav Singh
Department of Computer Science, Indian Institute of Technology, Kanpur, India
Vinay P. Namboodiri
Vinay P. Namboodiri
Department of Computer Science, University of Bath
Computer VisionImage ProcessingMachine Learning