STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Current vision-centric 3D occupancy and scene flow prediction methods suffer from insufficient local detail capture and weak spatial discriminability in sparse 3D space. To address this, we propose a fine-grained dynamic modeling framework comprising: (1) a novel occupancy-state-aware explicit feature reconstruction paradigm; (2) a sparse occlusion-aware attention mechanism coupled with a cascaded optimization strategy; and (3) a long-range dynamic interaction modeling module that jointly balances computational efficiency and spatial fidelity. Our approach significantly enhances 3D voxel feature representation capability. Quantitatively, it achieves state-of-the-art performance on RayIoU and mAVE metrics, reduces training memory consumption to 8.7 GB, and simultaneously improves both prediction accuracy and inference efficiency.

Technology Category

Application Category

📝 Abstract

3D occupancy and scene flow offer a detailed and dynamic representation of 3D scene. Recognizing the sparsity and complexity of 3D space, previous vision-centric methods have employed implicit learning-based approaches to model spatial and temporal information. However, these approaches struggle to capture local details and diminish the model's spatial discriminative ability. To address these challenges, we propose a novel explicit state-based modeling method designed to leverage the occupied state to renovate the 3D features. Specifically, we propose a sparse occlusion-aware attention mechanism, integrated with a cascade refinement strategy, which accurately renovates 3D features with the guidance of occupied state information. Additionally, we introduce a novel method for modeling long-term dynamic interactions, which reduces computational costs and preserves spatial information. Compared to the previous state-of-the-art methods, our efficient explicit renovation strategy not only delivers superior performance in terms of RayIoU and mAVE for occupancy and scene flow prediction but also markedly reduces GPU memory usage during training, bringing it down to 8.7GB. Our code is available on https://github.com/lzzzzzm/STCOcc

Problem

Research questions and friction points this paper is trying to address.

Improving 3D occupancy and scene flow prediction accuracy

Enhancing local detail capture in sparse 3D spaces

Reducing computational costs while preserving spatial information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit state-based modeling for 3D feature renovation

Sparse occlusion-aware attention with cascade refinement

Efficient long-term dynamic interaction modeling

🔎 Similar Papers

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity