TrackOcc: Camera-based 4D Panoptic Occupancy Tracking

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional purely vision-based approaches struggle to simultaneously ensure spatial completeness and temporal consistency in 3D object tracking and semantic occupancy prediction, hindering holistic dynamic scene understanding for advanced autonomous driving. To address this, we propose and formalize the novel task of **4D Panoptic Occupancy Tracking**, which jointly models voxel-level semantics, instance identity, and cross-frame spatiotemporal evolution. Methodologically, we design a camera-native end-to-end Transformer architecture featuring a **4D spatiotemporal query mechanism** and a **differentiable localization-aware loss**, enabling joint optimization of full-scene spatial coverage and temporally coherent object trajectories. Evaluated on the Waymo Open Dataset, our method achieves state-of-the-art performance: +12.7% AMOTA for trajectory completeness and +8.3% mIoU for occupancy prediction—establishing a new paradigm for purely vision-based 4D scene understanding.

Technology Category

Application Category

📝 Abstract
Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input. Furthermore, we propose TrackOcc, a cutting-edge approach that processes image inputs in a streaming, end-to-end manner with 4D panoptic queries to address the proposed task. Leveraging the localization-aware loss, TrackOcc enhances the accuracy of 4D panoptic occupancy tracking without bells and whistles. Experimental results demonstrate that our method achieves state-of-the-art performance on the Waymo dataset. The source code will be released at https://github.com/Tsinghua-MARS-Lab/TrackOcc.
Problem

Research questions and friction points this paper is trying to address.

Enhances dynamic scene understanding from camera input
Combines panoptic occupancy segmentation with object tracking
Improves accuracy in 4D panoptic occupancy tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Camera-based 4D Panoptic Occupancy Tracking
Streaming end-to-end image processing
Localization-aware loss enhances accuracy
🔎 Similar Papers
No similar papers found.
Z
Zhuoguang Chen
Shanghai Artificial Intelligence Laboratory, IIIS, Tsinghua University
Kenan Li
Kenan Li
Assistant Professor, Saint Louis University
public healthGISspatial statisticssystem dynamicsgeo-AI
X
Xiuyu Yang
IIIS, Tsinghua University
T
Tao Jiang
IIIS, Tsinghua University
Y
Yiming Li
New York University
H
Hang Zhao
Shanghai Artificial Intelligence Laboratory, IIIS, Tsinghua University, Shanghai Qi Zhi Institute