Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Addressing the fundamental challenges of sparse rewards and infeasible subgoals in long-horizon goal-oriented tasks, this paper proposes a graph-structured hierarchical reinforcement learning framework. Our method constructs a task graph where nodes represent abstract states or subgoals and edges encode transition feasibility. First, we constrain the high-level policy’s action space to select only subgoals reachable within one low-level episode, ensuring planning feasibility. Second, we introduce a strict subgoal execution mechanism coupled with failure-aware path optimization, dynamically updating edge costs using low-level success-rate feedback. Third, we decouple exploration strategies to enhance systematic state-space coverage. Experiments across multiple long-horizon benchmark tasks demonstrate significant improvements in task success rate and sample efficiency, outperforming state-of-the-art goal-oriented and hierarchical RL approaches.

Technology Category

Application Category

📝 Abstract

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.

Problem

Research questions and friction points this paper is trying to address.

Addresses subgoal infeasibility in hierarchical reinforcement learning

Improves exploration in sparse-reward long-horizon tasks

Enhances planning reliability via dynamic path refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enforces single-step subgoal reachability structurally

Uses decoupled exploration for systematic traversal

Implements failure-aware path refinement dynamically

🔎 Similar Papers

No similar papers found.