COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

165K/year
🤖 AI Summary
Existing autonomous driving world models struggle to disentangle ego-vehicle motion (viewpoint transformation) from scene evolution (agent interactions), leading to distorted occupancy predictions. To address this, we propose a scene-centric disentangled modeling framework: (1) a novel scene-center prediction branch that explicitly separates ego-irrelevant scene dynamics from ego-motion-induced viewpoint changes; (2) a customized ControlNet-based coordination mechanism enabling spatially consistent and controllable future occupancy generation; and (3) the first systematic integration of disentangled representation learning into occupancy-aware world models. Evaluated on nuScenes-Occ3D, our method achieves significant gains in mIoU—outperforming state-of-the-art DOME and UniScene by +26.3% and +23.7%, respectively. Moreover, it demonstrates robust superiority across long-horizon predictions (3s/8s) and diverse input modalities (camera-only, sensor-fused, or ground-truth inputs).

Technology Category

Application Category

📝 Abstract
World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26.3% better mIoU metric than DOME and 23.7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code and videos will be available at https://github.com/synsin0/COME.
Problem

Research questions and friction points this paper is trying to address.

Disentangle ego-motion from scene dynamics in autonomous driving
Improve future occupancy prediction accuracy using scene-centric controls
Enhance spatio-temporal prediction fidelity in world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene-centric coordinate systems for disentanglement
ControlNet for scene condition conversion
Occupancy world model integration for prediction
Y
Yining Shi
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
Kun Jiang
Kun Jiang
Tsinghua University
autonomous driving
Qiang Meng
Qiang Meng
Professor, Department of Civil and Environmental Engineering, National University of Singapore
Transportation Network ModellingShipping and Intermodal TransportationQuantitative Risk Assessment of Transport Operations
K
Ke Wang
Kargobot Inc.
Jiabao Wang
Jiabao Wang
NWPU, NKU
Object DetectionRotated Object Detection
W
Wenchao Sun
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
T
Tuopu Wen
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
M
Mengmeng Yang
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China
D
Diange Yang
School of Vehicle and Mobility, Tsinghua University, Beijing, China; State Key Laboratory of Intelligent Green Vehicle and Mobility, Beijing, China