Learning Class Prototypes for Unified Sparse Supervised 3D Object Detection

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sparse-supervision 3D detection methods are confined to outdoor scenes and lack unified indoor-outdoor modeling capability. This paper proposes the first unified sparse-supervision 3D detection framework applicable to both indoor and outdoor scenes, requiring only one annotated object per scene. Our method introduces two key innovations: (1) a prototype-driven unlabeled object matching mechanism that aligns category prototypes with point cloud features via optimal transport; and (2) a multi-label collaborative refinement module enabling cross-scene pseudo-label generation, quality assessment, and joint optimization. Evaluated on ScanNet V2, SUN RGB-D, and KITTI, our approach achieves 78%, 90%, and 96% of fully supervised performance, respectively—substantially outperforming prior sparse-supervision methods. To our knowledge, this is the first work to establish a high-accuracy, broadly generalizable unified sparse-supervision paradigm for 3D detection.

Technology Category

Application Category

📝 Abstract
Both indoor and outdoor scene perceptions are essential for embodied intelligence. However, current sparse supervised 3D object detection methods focus solely on outdoor scenes without considering indoor settings. To this end, we propose a unified sparse supervised 3D object detection method for both indoor and outdoor scenes through learning class prototypes to effectively utilize unlabeled objects. Specifically, we first propose a prototype-based object mining module that converts the unlabeled object mining into a matching problem between class prototypes and unlabeled features. By using optimal transport matching results, we assign prototype labels to high-confidence features, thereby achieving the mining of unlabeled objects. We then present a multi-label cooperative refinement module to effectively recover missed detections through pseudo label quality control and prototype label cooperation. Experiments show that our method achieves state-of-the-art performance under the one object per scene sparse supervised setting across indoor and outdoor datasets. With only one labeled object per scene, our method achieves about 78%, 90%, and 96% performance compared to the fully supervised detector on ScanNet V2, SUN RGB-D, and KITTI, respectively, highlighting the scalability of our method. Code is available at https://github.com/zyrant/CPDet3D.
Problem

Research questions and friction points this paper is trying to address.

Unified 3D object detection for indoor and outdoor scenes
Sparse supervision with only one labeled object per scene
Learning class prototypes to utilize unlabeled objects effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-based object mining for unlabeled features
Multi-label refinement with pseudo label control
Unified 3D detection for indoor and outdoor scenes
🔎 Similar Papers
No similar papers found.
Y
Yun Zhu
PCA Lab, Nanjing University of Science and Technology, Nanjing, China
Le Hui
Le Hui
Northwestern Polytechnical University
point cloud
H
Hang Yang
PCA Lab, Nanjing University of Science and Technology, Nanjing, China
Jianjun Qian
Jianjun Qian
Nanjing University of Science and Technology
Pattern RecognitionComputer VisionFace Recognition
J
Jin Xie
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Intelligence Science and Technology, Nanjing University, Suzhou, China
J
Jian Yang
PCA Lab, Nanjing University of Science and Technology, Nanjing, China