π€ AI Summary
Current vision-centric 3D occupancy and scene flow prediction methods suffer from insufficient local detail capture and weak spatial discriminability in sparse 3D space. To address this, we propose a fine-grained dynamic modeling framework comprising: (1) a novel occupancy-state-aware explicit feature reconstruction paradigm; (2) a sparse occlusion-aware attention mechanism coupled with a cascaded optimization strategy; and (3) a long-range dynamic interaction modeling module that jointly balances computational efficiency and spatial fidelity. Our approach significantly enhances 3D voxel feature representation capability. Quantitatively, it achieves state-of-the-art performance on RayIoU and mAVE metrics, reduces training memory consumption to 8.7 GB, and simultaneously improves both prediction accuracy and inference efficiency.
π Abstract
3D occupancy and scene flow offer a detailed and dynamic representation of 3D scene. Recognizing the sparsity and complexity of 3D space, previous vision-centric methods have employed implicit learning-based approaches to model spatial and temporal information. However, these approaches struggle to capture local details and diminish the model's spatial discriminative ability. To address these challenges, we propose a novel explicit state-based modeling method designed to leverage the occupied state to renovate the 3D features. Specifically, we propose a sparse occlusion-aware attention mechanism, integrated with a cascade refinement strategy, which accurately renovates 3D features with the guidance of occupied state information. Additionally, we introduce a novel method for modeling long-term dynamic interactions, which reduces computational costs and preserves spatial information. Compared to the previous state-of-the-art methods, our efficient explicit renovation strategy not only delivers superior performance in terms of RayIoU and mAVE for occupancy and scene flow prediction but also markedly reduces GPU memory usage during training, bringing it down to 8.7GB. Our code is available on https://github.com/lzzzzzm/STCOcc