AR Forcing: Towards Long-Horizon Robot Navigation World Model

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the distribution shift between training and autoregressive inference in diffusion-based world models for robotic navigation. To mitigate this issue, the authors propose AR Forcing, a method that, during autoregressive training, dynamically updates the context at each step using the model’s own predictions while jointly optimizing the standard diffusion noise prediction objective. Notably, AR Forcing requires no additional discriminator or distribution-matching loss and preserves the original diffusion framework and sampler. It is the first approach to explicitly align long-horizon inference state distributions during training. Evaluated on multiple navigation benchmarks—including RECON, SCAND, HuRoN, and TartanDrive—the method significantly improves long-horizon image generation consistency and trajectory prediction accuracy, thereby enhancing model robustness in both complex known and unseen environments.

📝 Abstract

The diffusion based robot navigation world models are typically trained using parallel supervision, while autoregressive inference is employed during path planning. This results in a distribution shift between training and inference, which destabilizes the performance over long-horizon prediction. We propose AR Forcing, an autoregressive training strategy, which integrates the standard diffusion loss into the autoregressive training loop. At each step, the model uses its own predictions to update the context and optimize the single step noise prediction objective, thereby explicitly exposing the model to the inference state distribution during training. Our method does not require additional discriminators or distribution-matching losses, retains the original diffusion framework and sampler, and is easy to integrate. Experiments on multi-domain navigation datasets (RECON, SCAND, HuRoN, TartanDrive) show that compared with strong baselines, AR Forcing improved the consistency of generated images during long-horizon navigation and the accuracy of predicted trajectories, enhancing robustness of the model in complex known and unknown environments. We will release the code soon.

Problem

Research questions and friction points this paper is trying to address.

distribution shift

long-horizon navigation

robot world model

autoregressive inference

diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

AR Forcing

autoregressive training

diffusion world model