Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address information leakage and computational redundancy caused by sparse 3D space in LiDAR point cloud self-supervised pre-training, this paper proposes a neighborhood-occupancy-constrained multi-scale masked autoencoding framework. The method employs a 3D backbone network optimized end-to-end for efficiency. Its core contributions are: (1) a novel neighborhood occupancy reconstruction mechanism that explicitly models local voxel occupancy relationships to prevent information leakage from empty regions; and (2) a hierarchical masking generation strategy leveraging multi-scale voxel partitioning to produce computationally efficient sparse masks. Evaluated on nuScenes and Waymo Open Dataset, the framework achieves state-of-the-art performance across downstream tasks—including semantic segmentation and 3D object detection—demonstrating superior effectiveness and generalization in real-world autonomous driving scenarios.

Technology Category

Application Category

📝 Abstract
Masked autoencoders (MAE) have shown tremendous potential for self-supervised learning (SSL) in vision and beyond. However, point clouds from LiDARs used in automated driving are particularly challenging for MAEs since large areas of the 3D volume are empty. Consequently, existing work suffers from leaking occupancy information into the decoder and has significant computational complexity, thereby limiting the SSL pre-training to only 2D bird's eye view encoders in practice. In this work, we propose the novel neighborhood occupancy MAE (NOMAE) that overcomes the aforementioned challenges by employing masked occupancy reconstruction only in the neighborhood of non-masked voxels. We incorporate voxel masking and occupancy reconstruction at multiple scales with our proposed hierarchical mask generation technique to capture features of objects of different sizes in the point cloud. NOMAEs are extremely flexible and can be directly employed for SSL in existing 3D architectures. We perform extensive evaluations on the nuScenes and Waymo Open datasets for the downstream perception tasks of semantic segmentation and 3D object detection, comparing with both discriminative and generative SSL methods. The results demonstrate that NOMAE sets the new state-of-the-art on multiple benchmarks for multiple point cloud perception tasks.
Problem

Research questions and friction points this paper is trying to address.

Improves self-supervised learning in LiDAR point clouds.
Reduces computational complexity in 3D object detection.
Enhances semantic segmentation accuracy in autonomous driving.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neighborhood occupancy masked autoencoder
Hierarchical mask generation technique
Multi-scale voxel masking and reconstruction
🔎 Similar Papers
No similar papers found.