STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction

πŸ“… 2025-04-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current vision-centric 3D occupancy and scene flow prediction methods suffer from insufficient local detail capture and weak spatial discriminability in sparse 3D space. To address this, we propose a fine-grained dynamic modeling framework comprising: (1) a novel occupancy-state-aware explicit feature reconstruction paradigm; (2) a sparse occlusion-aware attention mechanism coupled with a cascaded optimization strategy; and (3) a long-range dynamic interaction modeling module that jointly balances computational efficiency and spatial fidelity. Our approach significantly enhances 3D voxel feature representation capability. Quantitatively, it achieves state-of-the-art performance on RayIoU and mAVE metrics, reduces training memory consumption to 8.7 GB, and simultaneously improves both prediction accuracy and inference efficiency.

Technology Category

Application Category

πŸ“ Abstract
3D occupancy and scene flow offer a detailed and dynamic representation of 3D scene. Recognizing the sparsity and complexity of 3D space, previous vision-centric methods have employed implicit learning-based approaches to model spatial and temporal information. However, these approaches struggle to capture local details and diminish the model's spatial discriminative ability. To address these challenges, we propose a novel explicit state-based modeling method designed to leverage the occupied state to renovate the 3D features. Specifically, we propose a sparse occlusion-aware attention mechanism, integrated with a cascade refinement strategy, which accurately renovates 3D features with the guidance of occupied state information. Additionally, we introduce a novel method for modeling long-term dynamic interactions, which reduces computational costs and preserves spatial information. Compared to the previous state-of-the-art methods, our efficient explicit renovation strategy not only delivers superior performance in terms of RayIoU and mAVE for occupancy and scene flow prediction but also markedly reduces GPU memory usage during training, bringing it down to 8.7GB. Our code is available on https://github.com/lzzzzzm/STCOcc
Problem

Research questions and friction points this paper is trying to address.

Improving 3D occupancy and scene flow prediction accuracy
Enhancing local detail capture in sparse 3D spaces
Reducing computational costs while preserving spatial information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit state-based modeling for 3D feature renovation
Sparse occlusion-aware attention with cascade refinement
Efficient long-term dynamic interaction modeling
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhimin Liao
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University
Ping Wei
Ping Wei
Fudan university
Multimedia securityImage synthesis
S
Shuaijia Chen
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University
Haoxuan Wang
Haoxuan Wang
PhD, University of Illinois Chicago
Machine Learning Efficiency
Z
Ziyang Ren
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University