OptionZero: Planning with Learned Options

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning suffers from low planning efficiency in complex environments and often relies on predefined options or expert demonstrations. Method: This paper proposes the first end-to-end framework that deeply integrates option discovery with MuZero, featuring a learnable option network and an option-adapted dynamics model to autonomously discover and model high-level action options; it further incorporates option-level Monte Carlo tree search and self-supervised training within the MuZero architecture. Contribution/Results: It achieves, for the first time, prior-free, expert-free self-play-based option generation; significantly enhances policy abstraction and search depth. Experiments on 26 Atari games show a 131.58% improvement in average human-normalized score over MuZero, while the learned options exhibit strong interpretability and task adaptivity.

Technology Category

Application Category

📝 Abstract
Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert demonstration data. Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named OptionZero. OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games. Furthermore, we modify the dynamics network to provide environment transitions when using options, allowing searching deeper under the same simulation constraints. Empirical experiments conducted in 26 Atari games demonstrate that OptionZero outperforms MuZero, achieving a 131.58% improvement in mean human-normalized score. Our behavior analysis shows that OptionZero not only learns options but also acquires strategic skills tailored to different game characteristics. Our findings show promising directions for discovering and using options in planning. Our code is available at https://rlg.iis.sinica.edu.tw/papers/optionzero.
Problem

Research questions and friction points this paper is trying to address.

Autonomous discovery of options
Enhancing reinforcement learning efficiency
Strategic skills acquisition in games
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous option discovery
Enhanced dynamics network
Self-play game integration
🔎 Similar Papers
No similar papers found.
Po-Wei Huang
Po-Wei Huang
Institute of Information Science, Academia Sinica, Taiwan; Department of Computer Science, National Yang Ming Chiao Tung University, Taiwan
P
Pei-Chiun Peng
Institute of Information Science, Academia Sinica, Taiwan; Department of Computer Science, National Yang Ming Chiao Tung University, Taiwan
H
Hung Guei
Institute of Information Science, Academia Sinica, Taiwan
Ti-Rong Wu
Ti-Rong Wu
Institute of Information Science, Academia Sinica
Reinforcement learningPlanningComputer gamesDeep learningArtificial intelligence