π€ AI Summary
Addressing the fundamental challenges of sparse rewards and infeasible subgoals in long-horizon goal-oriented tasks, this paper proposes a graph-structured hierarchical reinforcement learning framework. Our method constructs a task graph where nodes represent abstract states or subgoals and edges encode transition feasibility. First, we constrain the high-level policyβs action space to select only subgoals reachable within one low-level episode, ensuring planning feasibility. Second, we introduce a strict subgoal execution mechanism coupled with failure-aware path optimization, dynamically updating edge costs using low-level success-rate feedback. Third, we decouple exploration strategies to enhance systematic state-space coverage. Experiments across multiple long-horizon benchmark tasks demonstrate significant improvements in task success rate and sample efficiency, outperforming state-of-the-art goal-oriented and hierarchical RL approaches.
π Abstract
Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.