🤖 AI Summary
To address the challenge of representing irregular obstacles with 3D bounding boxes in autonomous driving, this paper proposes GUIDE—a unified framework leveraging 3D Gaussian ellipsoids for both instance detection and occupancy prediction. Methodologically, we introduce a sparse Gaussian-to-voxel projection strategy that preserves accuracy while substantially reducing computational overhead; further, we integrate instance-level occupancy prediction with multi-frame Gaussian parameter association to achieve robust 3D instance tracking. On the nuScenes benchmark, GUIDE achieves an instance occupancy mAP of 21.61—50% higher than the state-of-the-art—while also delivering significant gains in detection and tracking performance. Key contributions include: (1) the first unified modeling of 3D Gaussians across perception tasks; (2) a novel sparse Gaussian–voxel projection mechanism; and (3) an end-to-end trainable joint framework for instance occupancy prediction and tracking.
📝 Abstract
In the realm of autonomous driving, accurately detecting surrounding obstacles is crucial for effective decision-making. Traditional methods primarily rely on 3D bounding boxes to represent these obstacles, which often fail to capture the complexity of irregularly shaped, real-world objects. To overcome these limitations, we present GUIDE, a novel framework that utilizes 3D Gaussians for instance detection and occupancy prediction. Unlike conventional occupancy prediction methods, GUIDE also offers robust tracking capabilities. Our framework employs a sparse representation strategy, using Gaussian-to-Voxel Splatting to provide fine-grained, instance-level occupancy data without the computational demands associated with dense voxel grids. Experimental validation on the nuScenes dataset demonstrates GUIDE's performance, with an instance occupancy mAP of 21.61, marking a 50% improvement over existing methods, alongside competitive tracking capabilities. GUIDE establishes a new benchmark in autonomous perception systems, effectively combining precision with computational efficiency to better address the complexities of real-world driving environments.