FSHNet: Fully Sparse Hybrid Network for 3D Object Detection

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fully sparse 3D detectors process only non-empty voxels, leading to weak long-range feature interactions and missing central-region representations—thereby limiting detection accuracy and optimization stability. To address this, we propose SlotFormer, a sparse Transformer module based on slot-based partitioning (replacing conventional window partitioning) to enhance long-range modeling. We further design a dynamic sparse label assignment strategy to mitigate the sparsity of positive samples, and introduce a lightweight sparse upsampling module to recover fine-grained geometric structure. The entire framework operates exclusively on sparse tensors, balancing computational efficiency and detection performance. Our method achieves state-of-the-art results on Waymo Open Dataset, nuScenes, and Argoverse2, with particularly notable improvements in small-object detection. Code is publicly available.

Technology Category

Application Category

📝 Abstract
Fully sparse 3D detectors have recently gained significant attention due to their efficiency in long-range detection. However, sparse 3D detectors extract features only from non-empty voxels, which impairs long-range interactions and causes the center feature missing. The former weakens the feature extraction capability, while the latter hinders network optimization. To address these challenges, we introduce the Fully Sparse Hybrid Network (FSHNet). FSHNet incorporates a proposed SlotFormer block to enhance the long-range feature extraction capability of existing sparse encoders. The SlotFormer divides sparse voxels using a slot partition approach, which, compared to traditional window partition, provides a larger receptive field. Additionally, we propose a dynamic sparse label assignment strategy to deeply optimize the network by providing more high-quality positive samples. To further enhance performance, we introduce a sparse upsampling module to refine downsampled voxels, preserving fine-grained details crucial for detecting small objects. Extensive experiments on the Waymo, nuScenes, and Argoverse2 benchmarks demonstrate the effectiveness of FSHNet. The code is available at https://github.com/Say2L/FSHNet.
Problem

Research questions and friction points this paper is trying to address.

Enhance long-range feature extraction in sparse 3D detectors
Address center feature missing issue in sparse voxel networks
Improve detection of small objects via sparse upsampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

SlotFormer block enhances long-range feature extraction
Dynamic sparse label assignment optimizes network
Sparse upsampling module refines downsampled voxels
🔎 Similar Papers
No similar papers found.
S
Shuai Liu
School of Computer Science and Engineering, Sun Yat-sen University
Mingyue Cui
Mingyue Cui
Sun Yat-sen University
Intelligent networked vehiclesautonomous drivingmulti-sensor fusion
B
Boyang Li
Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-sen University
Quanmin Liang
Quanmin Liang
Sun Yat-Sen University
MultimodalEmbodied AI
T
Tinghe Hong
School of Computer Science and Engineering, Sun Yat-sen University
K
Kai Huang
School of Computer Science and Engineering, Sun Yat-sen University
Y
Yunxiao Shan
School of Artificial Intelligence, Sun Yat-sen University