COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing autonomous driving world models struggle to disentangle ego-vehicle motion (viewpoint transformation) from scene evolution (agent interactions), leading to distorted occupancy predictions. To address this, we propose a scene-centric disentangled modeling framework: (1) a novel scene-center prediction branch that explicitly separates ego-irrelevant scene dynamics from ego-motion-induced viewpoint changes; (2) a customized ControlNet-based coordination mechanism enabling spatially consistent and controllable future occupancy generation; and (3) the first systematic integration of disentangled representation learning into occupancy-aware world models. Evaluated on nuScenes-Occ3D, our method achieves significant gains in mIoU—outperforming state-of-the-art DOME and UniScene by +26.3% and +23.7%, respectively. Moreover, it demonstrates robust superiority across long-horizon predictions (3s/8s) and diverse input modalities (camera-only, sensor-fused, or ground-truth inputs).

Technology Category

Application Category

📝 Abstract
World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26.3% better mIoU metric than DOME and 23.7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code and videos will be available at https://github.com/synsin0/COME.
Problem

Research questions and friction points this paper is trying to address.

Disentangle ego-motion from scene dynamics in autonomous driving
Improve future occupancy prediction accuracy using scene-centric controls
Enhance spatio-temporal prediction fidelity in world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene-centric coordinate systems for disentanglement
ControlNet for scene condition conversion
Occupancy world model integration for prediction
🔎 Similar Papers
No similar papers found.
Y
Yining Shi
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
Kun Jiang
Kun Jiang
Tsinghua University
autonomous driving
Qiang Meng
Qiang Meng
Kargobot Inc.
K
Ke Wang
Kargobot Inc.
Jiabao Wang
Jiabao Wang
NWPU, NKU
Object DetectionRotated Object Detection
W
Wenchao Sun
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
T
Tuopu Wen
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
M
Mengmeng Yang
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
D
Diange Yang
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China