🤖 AI Summary
This work addresses the challenges of severe class imbalance and high annotation costs in 3D occupancy prediction stemming from voxel-based representations. The authors propose the first active learning framework that incorporates class distribution information, selecting informative samples through three criteria: frequency-weighted uncertainty, inter-sample diversity, and intra-batch diversity, thereby ensuring coverage of rare classes. To mitigate map memorization bias, they introduce a geographically disjoint train/validation split and validate their approach cross-dataset on Occ3D-nuScenes and SemanticKITTI. Using only 42.4% of the labeled data, their method achieves a mean Intersection-over-Union (mIoU) of 26.62, matching the performance of fully supervised baselines and significantly outperforming existing active learning approaches.
📝 Abstract
3D occupancy prediction provides dense spatial understanding critical for safe autonomous driving. However, this task suffers from a severe class imbalance due to its volumetric representation, where safety-critical objects (bicycles, traffic cones, pedestrians) occupy minimal voxels compared to dominant backgrounds. Additionally, voxel-level annotation is costly, yet dedicating effort to dominant classes is inefficient. To address these challenges, we propose a class-distribution guided active learning framework for selecting training samples to annotate in autonomous driving datasets. Our approach combines three complementary criteria to select the training samples. Inter-sample diversity prioritizes samples whose predicted class distributions differ from those of the labeled set, intra-set diversity prevents redundant sampling within each acquisition cycle, and frequency-weighted uncertainty emphasizes rare classes by reweighting voxel-level entropy with inverse per-sample class proportions. We ensure evaluation validity by using a geographically disjoint train/validation split of Occ3D-nuScenes, which reduces train-validation overlap and mitigates potential map memorization. With only 42.4% labeled data, our framework reaches 26.62 mIoU, comparable to full supervision and outperforming active learning baselines at the same budget. We further validate generality on SemanticKITTI using a different architecture, demonstrating consistent effectiveness across datasets.