IMWM: Intuition Models Complement World Models for Latent Planning

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenge of inefficient planning in pixel-level control tasks under limited sampling budgets when relying solely on world models. To overcome this limitation, the authors propose the Intuition-guided Model-based World Model (IMWM) framework, which integrates an intuition model into the model-based planning pipeline for the first time. IMWM incorporates three lightweight components—retrieval-based initialization, a hybrid cost function, and a reliability gating mechanism—to leverage data-driven priors that guide the search process. This approach substantially mitigates planning failures under low-sampling conditions, consistently outperforming pure world model methods across four pixel-level tasks. Notably, it achieves success rates of 99.2% and 94.7% on the Two-Room and OGBench-Cube tasks, representing improvements of 11.5 and 28.5 percentage points, respectively.

📝 Abstract

Planning with a learned latent world model is a promising route to control from raw pixels, but a strong world model alone is not enough. We show this experimentally: even with a perfect world model (operationalized by replacing the learned forward predictor with an idealized rollout of the true environment dynamics), a finite-budget sample-based planner still fails on some tasks, indicating that the bottleneck can lie in search rather than in world-model accuracy. Motivated by this gap, we propose IMWM (Intuition Model + World Model), which pairs the world model with an intuition model trained from demonstrations to recognize promising actions. The two models collaborate through three lightweight components: (i) Retrieval Initialization, which initializes the planner's action proposal from a retrieved demonstration; (ii) Hybrid Cost, which combines the intuition score with the world-model rollout cost; and (iii) a Reliability Gate, which adjusts how much the planner trusts intuition in each setting. Across four pixel-based goal-reaching tasks (Two-Room, Reacher, Push-T, and OGBench-Cube), IMWM has higher mean success than the world-model-only planner on all four, with the largest gains on Two-Room (99.2%, +11.5 percentage points) and OGBench-Cube (94.7%, +28.5 percentage points).

Problem

Research questions and friction points this paper is trying to address.

latent planning

world model

sample-based planning

search bottleneck

pixel-based control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intuition Model

World Model

Latent Planning