🤖 AI Summary
To address the challenges of task decomposition, low policy learning efficiency, and poor generalization in hierarchical reinforcement learning under long-horizon sparse-reward settings, this paper proposes a cross-layer nested options mechanism coupled with an inner-layer policy guidance strategy. Specifically, it constructs reusable and nestable modular options within an abstract state space to enable hierarchical modeling of high-level actions and transfer of motion patterns across tasks. Task-action integration and customized reward shaping are further employed to enhance decision interpretability and safety guarantees. Experimental results on procedurally generated grid-world environments demonstrate significant improvements: +37% in sample efficiency and +29% in cross-task generalization performance. These findings validate the method’s effectiveness, scalability, and suitability for complex, safety-critical, and industrial applications.
📝 Abstract
This paper introduces MANGO (Multilayer Abstraction for Nested Generation of Options), a novel hierarchical reinforcement learning framework designed to address the challenges of long-term sparse reward environments. MANGO decomposes complex tasks into multiple layers of abstraction, where each layer defines an abstract state space and employs options to modularize trajectories into macro-actions. These options are nested across layers, allowing for efficient reuse of learned movements and improved sample efficiency. The framework introduces intra-layer policies that guide the agent's transitions within the abstract state space, and task actions that integrate task-specific components such as reward functions. Experiments conducted in procedurally-generated grid environments demonstrate substantial improvements in both sample efficiency and generalization capabilities compared to standard RL methods. MANGO also enhances interpretability by making the agent's decision-making process transparent across layers, which is particularly valuable in safety-critical and industrial applications. Future work will explore automated discovery of abstractions and abstract actions, adaptation to continuous or fuzzy environments, and more robust multi-layer training strategies.