๐ค AI Summary
Existing approaches struggle to simultaneously achieve fine-grained 3D geometric modeling and explicit cross-frame object association in dynamic scenes, resulting in incomplete 4D panoptic occupancy tracking. This work proposes an end-to-end method for 4D panoptic occupancy tracking that, for the first time, integrates implicit Gaussian latent representations with voxelized occupancy prediction. Specifically, sparse 3D Gaussian representations are generated from multi-view images, and their features are projected onto a voxel grid; combined with a mask segmentation head, this enables temporally consistent panoptic occupancy prediction. The approach facilitates efficient multi-view information aggregation and explicit temporal association, achieving state-of-the-art performance on the Occ3D-nuScenes and Waymo datasets, thereby advancing holistic 4D scene understanding.
๐ Abstract
Capturing 4D spatiotemporal surroundings is crucial for the safe and reliable operation of robots in dynamic environments. However, most existing methods address only one side of the problem: they either provide coarse geometric tracking via bounding boxes, or detailed 3D structures like voxel-based occupancy that lack explicit temporal association. In this work, we present Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking (LaGS) that advances spatiotemporal scene understanding in a holistic direction. Our approach incorporates camera-based end-to-end tracking with mask-based multi-view panoptic occupancy prediction, and addresses the key challenge of efficiently aggregating multi-view information into 3D voxel grids via a novel latent Gaussian splatting approach. Specifically, we first fuse observations into 3D Gaussians that serve as a sparse point-centric latent representation of the 3D scene, and then splat the aggregated features onto a 3D voxel grid that is decoded by a mask-based segmentation head. We evaluate LaGS on the Occ3D nuScenes and Waymo datasets, achieving state-of-the-art performance for 4D panoptic occupancy tracking. We make our code available at https://lags.cs.uni-freiburg.de/.