๐ค AI Summary
This work addresses the challenge of safety evaluation for autonomous driving in long-tail scenarios, where existing neural simulators exhibit limited generalization. The authors propose a real-time generative world model based on an action-conditioned autoregressive diffusion framework, leveraging the large-scale video diffusion model Cosmosโadapted via mid-to-post training for autonomous driving simulation. Trained on 21,000 hours of driving data, the model enables realistic synthesis of unseen scenarios, including extreme weather and unpredictable dynamic behaviors. Integrated into a closed-loop system with the Alpamayo 1 policy and the AlpaSim coordinator, the approach significantly outperforms a vision-language-action (VLA) policy model with five times more parameters on the NuRec benchmark, demonstrating its strong potential as a policy backbone.
๐ Abstract
As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and struggle to generalize to highly dynamic or novel scenes. To overcome these limitations, we introduce OmniDreams, a foundation generative world model mid- and post-trained from the Cosmos diffusion model to autoregressively generate action-conditioned videos in real time. By leveraging the rich visual priors of Cosmos and mid- and post-training on 21k hours of driving scenarios, OmniDreams synthesizes complex, unobserved phenomena that are hard for traditional simulators to capture, such as extreme weather and unpredictable dynamic agent behaviors. Crucially, it autoregressively conditions its photorealistic sensor generation on past frames, the current simulator state, and immediate driving actions. Deployed in a closed-loop system with the Alpamayo 1 policy model and AlpaSim orchestrator, OmniDreams acts as a highly responsive, reactive environment, providing a scalable and comprehensive solution for training and evaluating next-generation autonomous driving policies. We additionally show preliminary results indicating that a world-action model (WAM) post-trained from OmniDreams achieves strong performance on the Physical AI Autonomous Vehicles NuRec dataset, surpassing the VLA-based Alpamayo 1.5 research policy model while using only 1/5 the total parameters. These results highlight the potential for a real-time world model like OmniDreams to also serve as a backbone for policy architectures.