Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

πŸ“… 2025-06-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the fundamental challenges of sparse rewards and infeasible subgoals in long-horizon goal-oriented tasks, this paper proposes a graph-structured hierarchical reinforcement learning framework. Our method constructs a task graph where nodes represent abstract states or subgoals and edges encode transition feasibility. First, we constrain the high-level policy’s action space to select only subgoals reachable within one low-level episode, ensuring planning feasibility. Second, we introduce a strict subgoal execution mechanism coupled with failure-aware path optimization, dynamically updating edge costs using low-level success-rate feedback. Third, we decouple exploration strategies to enhance systematic state-space coverage. Experiments across multiple long-horizon benchmark tasks demonstrate significant improvements in task success rate and sample efficiency, outperforming state-of-the-art goal-oriented and hierarchical RL approaches.

Technology Category

Application Category

πŸ“ Abstract
Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
Problem

Research questions and friction points this paper is trying to address.

Addresses subgoal infeasibility in hierarchical reinforcement learning
Improves exploration in sparse-reward long-horizon tasks
Enhances planning reliability via dynamic path refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enforces single-step subgoal reachability structurally
Uses decoupled exploration for systematic traversal
Implements failure-aware path refinement dynamically
πŸ”Ž Similar Papers
No similar papers found.
J
Jaebak Hwang
Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea 44919
S
Sanghyeon Lee
Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea 44919
J
Jeongmo Kim
Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea 44919
Seungyul Han
Seungyul Han
Assistant Professor, Graduate School of AI, UNIST
Reinforcement LearningMachine LearningIntelligent ControlSignal Processing