On Sample-Efficient Generalized Planning via Learned Transition Models

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses key limitations of existing Transformer-based approaches to generalized planning—namely, poor out-of-distribution generalization, low sample efficiency, and state drift over long horizons. The authors propose reframing generalized planning as the explicit learning of a state transition model, where a neural network approximates the successor state function and generates plans by autoregressively predicting intermediate world states rather than directly outputting action sequences. By integrating symbolic state trajectory unfolding, relational graph encoding, and multi-representation state modeling, the method achieves significantly higher success rates in finding satisficing solutions out-of-distribution across multiple domains. Notably, it does so using fewer training samples and smaller model sizes compared to current direct action prediction approaches.

Technology Category

Application Category

📝 Abstract

Generalized planning studies the construction of solution strategies that generalize across families of planning problems sharing a common domain model, formally defined by a transition function $γ: S \times A \rightarrow S$. Classical approaches achieve such generalization through symbolic abstractions and explicit reasoning over $γ$. In contrast, recent Transformer-based planners, such as PlanGPT and Plansformer, largely cast generalized planning as direct action-sequence prediction, bypassing explicit transition modeling. While effective on in-distribution instances, these approaches typically require large datasets and model sizes, and often suffer from state drift in long-horizon settings due to the absence of explicit world-state evolution. In this work, we formulate generalized planning as a transition-model learning problem, in which a neural model explicitly approximates the successor-state function $\hatγ \approx γ$ and generates plans by rolling out symbolic state trajectories. Instead of predicting actions directly, the model autoregressively predicts intermediate world states, thereby learning the domain dynamics as an implicit world model. To study size-invariant generalization and sample efficiency, we systematically evaluate multiple state representations and neural architectures, including relational graph encodings. Our results show that learning explicit transition models yields higher out-of-distribution satisficing-plan success than direct action-sequence prediction in multiple domains, while achieving these gains with significantly fewer training instances and smaller models. This is an extended version of a short paper accepted at ICAPS 2026 under the same title.

Problem

Research questions and friction points this paper is trying to address.

generalized planning

sample efficiency

transition model

out-of-distribution generalization

state drift

Innovation

Methods, ideas, or system contributions that make the work stand out.

transition model learning

generalized planning

state prediction