OptionZero: Planning with Learned Options

📅 2025-02-23

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Reinforcement learning suffers from low planning efficiency in complex environments and often relies on predefined options or expert demonstrations. Method: This paper proposes the first end-to-end framework that deeply integrates option discovery with MuZero, featuring a learnable option network and an option-adapted dynamics model to autonomously discover and model high-level action options; it further incorporates option-level Monte Carlo tree search and self-supervised training within the MuZero architecture. Contribution/Results: It achieves, for the first time, prior-free, expert-free self-play-based option generation; significantly enhances policy abstraction and search depth. Experiments on 26 Atari games show a 131.58% improvement in average human-normalized score over MuZero, while the learned options exhibit strong interpretability and task adaptivity.

Technology Category

Application Category

📝 Abstract

Planning with options -- a sequence of primitive actions -- has been shown effective in reinforcement learning within complex environments. Previous studies have focused on planning with predefined options or learned options through expert demonstration data. Inspired by MuZero, which learns superhuman heuristics without any human knowledge, we propose a novel approach, named OptionZero. OptionZero incorporates an option network into MuZero, providing autonomous discovery of options through self-play games. Furthermore, we modify the dynamics network to provide environment transitions when using options, allowing searching deeper under the same simulation constraints. Empirical experiments conducted in 26 Atari games demonstrate that OptionZero outperforms MuZero, achieving a 131.58% improvement in mean human-normalized score. Our behavior analysis shows that OptionZero not only learns options but also acquires strategic skills tailored to different game characteristics. Our findings show promising directions for discovering and using options in planning. Our code is available at https://rlg.iis.sinica.edu.tw/papers/optionzero.

Problem

Research questions and friction points this paper is trying to address.

Autonomous discovery of options

Enhancing reinforcement learning efficiency

Strategic skills acquisition in games

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous option discovery

Enhanced dynamics network

Self-play game integration

🔎 Similar Papers

Can Learned Optimization Make Reinforcement Learning Less Difficult?