WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of behavioral cloning—namely, poor task transferability and high dependence on large, expert-labeled datasets—by proposing a vision-based world model framework for autonomous planning. Methodologically, it introduces an action-conditioned visual world model trained on minimal unstructured play data to capture environment dynamics; a diffusion-model-driven action sampler to mitigate hallucination in multi-step prediction; and the first integration of Monte Carlo Tree Search (MCTS) with zeroth-order model predictive control (MPC) for end-to-end, long-horizon visual–action joint optimization. An optional reward model can be incorporated to enhance planning robustness. Evaluated on three real-robot manipulation tasks, the framework significantly outperforms behavioral cloning baselines in success rate and cross-task generalization. It establishes a new paradigm for data-efficient, generalizable robotic planning.

Technology Category

Application Category

📝 Abstract
Robots must understand their environment from raw sensory inputs and reason about the consequences of their actions in it to solve complex tasks. Behavior Cloning (BC) leverages task-specific human demonstrations to learn this knowledge as end-to-end policies. However, these policies are difficult to transfer to new tasks, and generating training data is challenging because it requires careful demonstrations and frequent environment resets. In contrast to such policy-based view, in this paper we take a model-based approach where we collect a few hours of unstructured easy-to-collect play data to learn an action-conditioned visual world model, a diffusion-based action sampler, and optionally a reward model. The world model -- in combination with the action sampler and a reward model -- is then used to optimize long sequences of actions with a Monte Carlo Tree Search (MCTS) planner. The resulting plans are executed on the robot via a zeroth-order Model Predictive Controller (MPC). We show that the action sampler mitigates hallucinations of the world model during planning and validate our approach on 3 real-world robotic tasks with varying levels of planning and modeling complexity. Our experiments support the hypothesis that planning leads to a significant improvement over BC baselines on a standard manipulation test environment.
Problem

Research questions and friction points this paper is trying to address.

Robots need to understand environments and reason about action consequences
Behavior Cloning policies are difficult to transfer across different tasks
Planning with world models requires mitigating hallucinations during optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-conditioned visual world model learning
Diffusion-based action sampler for planning
Monte Carlo Tree Search with MPC execution
🔎 Similar Papers
No similar papers found.
R
R. Khorrambakht
Center for Robotics and Embodied Intelligence, Tandon School of Engineering, New York University, Brooklyn, NY
Joaquim Ortiz-Haro
Joaquim Ortiz-Haro
TU Berlin, IMPRS-IS
RoboticsNonlinear OptimizationLearning
J
Joseph Amigo
Center for Robotics and Embodied Intelligence, Tandon School of Engineering, New York University, Brooklyn, NY
O
Omar Mostafa
Center for Robotics and Embodied Intelligence, Tandon School of Engineering, New York University, Brooklyn, NY
Daniel Dugas
Daniel Dugas
PhD, Autonomous Systems Lab, ETH Zurich
Machine learningRoboticsSLAM
Franziska Meier
Franziska Meier
Research Scientist, Facebook AI Research
Machine LearningRobotics
Ludovic Righetti
Ludovic Righetti
New York University and Artificial and Natural Intelligence Toulouse Institute
Robotics