Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

📅 2024-08-26
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of tightly coupling vision-driven 4D spatiotemporal prediction and motion planning in autonomous driving, this paper proposes the first vision-centric 4D occupancy world model jointly trained with end-to-end planning. Methodologically, it introduces a semantic-motion-aware memory normalization module and a geometry-aware spatiotemporal decoder atop BEV feature encoding, enabling the first controllable joint generation of 4D semantic occupancy and motion flow; explicit control signals (e.g., speed, steering angle) are incorporated via action-conditioning injection. The resulting 4D occupancy output is directly embedded into a differentiable planner, where trajectory optimization is guided by an occupancy-based cost function. Evaluated on three major benchmarks—including nuScenes—the framework achieves significant improvements in 4D prediction fidelity, planning rationality, and cross-scenario generalization. This work establishes a novel paradigm for generative driving world modeling and closed-loop planning.

Technology Category

Application Category

📝 Abstract
World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D forecasting world model to end-to-end planning for autonomous driving. Specifically, we first introduce a semantic and motion-conditional normalization in the memory module, which accumulates semantic and dynamic information from historical BEV embeddings. These BEV features are then conveyed to the world decoder for future occupancy and flow forecasting, considering both geometry and spatiotemporal modeling. Additionally, we propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation and facilitate a broader range of downstream applications. Furthermore, we explore integrating the generative capabilities of the 4D world model with end-to-end planning, enabling continuous forecasting of future states and the selection of optimal trajectories using an occupancy-based cost function. Comprehensive experiments conducted on the nuScenes, nuScenes-Occupancy, and Lyft-Level5 datasets illustrate that our method can generate plausible and controllable 4D occupancy, paving the way for advancements in driving world generation and end-to-end planning. Project page: https://drive-occworld.github.io/
Problem

Research questions and friction points this paper is trying to address.

4D Prediction
Visual Information
Autonomous Driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D Prediction
Drive-OccWorld Model
Autonomous Driving
🔎 Similar Papers
No similar papers found.
Y
Yu Yang
Zhejiang University
Jianbiao Mei
Jianbiao Mei
Zhejiang University
computer visiondeep learning
Yukai Ma
Yukai Ma
Zhejiang University
S
Siliang Du
Huawei Technologies
W
Wenqing Chen
Huawei Technologies
Y
Yijie Qian
Zhejiang University
Y
Yuxiang Feng
Zhejiang University
Y
Yong Liu
Zhejiang University