Planning in a recurrent neural network that plays Sokoban

📅 2024-07-22

📈 Citations: 1

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work investigates how recurrent neural networks (RNNs) spontaneously develop internal planning mechanisms when solving complex, sequential decision-making tasks with irreversible actions—exemplified by Sokoban. Using behavioral analysis, causal modeling of neural dynamics, and systematic evaluation of sequence generalization, we demonstrate that trained LSTM and GRU models implicitly learn causal planning representations capable of anticipating actions up to ~50 steps ahead. We report the first discovery of an emergent “pacing” computation strategy: models periodically re-encode the initial state to extend effective planning horizons, substantially improving solution quality. These planning representations exhibit strong robustness and generalize to out-of-distribution levels far exceeding training complexity. Moreover, planning depth and accuracy dynamically scale with computational budget (i.e., number of inference steps). To foster reproducibility and further research, we publicly release all models and code. This study establishes a novel paradigm for probing and characterizing intrinsic planning capabilities in neural networks.

Technology Category

Application Category

📝 Abstract

Planning is essential for solving complex tasks, yet the internal mechanisms underlying planning in neural networks remain poorly understood. Building on prior work, we analyze a recurrent neural network (RNN) trained on Sokoban, a challenging puzzle requiring sequential, irreversible decisions. We find that the RNN has a causal plan representation which predicts its future actions about 50 steps in advance. The quality and length of the represented plan increases over the first few steps. We uncover a surprising behavior: the RNN"paces"in cycles to give itself extra computation at the start of a level, and show that this behavior is incentivized by training. Leveraging these insights, we extend the trained RNN to significantly larger, out-of-distribution Sokoban puzzles, demonstrating robust representations beyond the training regime. We open-source our model and code, and believe the neural network's interesting behavior makes it an excellent model organism to deepen our understanding of learned planning.

Problem

Research questions and friction points this paper is trying to address.

Understanding planning mechanisms in recurrent neural networks

Analyzing RNN behavior in Sokoban puzzle solving

Extending RNN performance to larger out-of-distribution puzzles

Innovation

Methods, ideas, or system contributions that make the work stand out.

RNN predicts actions 50 steps ahead

RNN paces cycles for extra computation

Extends RNN to larger out-of-distribution puzzles

🔎 Similar Papers

No similar papers found.