π€ AI Summary
To address inefficiencies in multi-sensor fusion, inadequate occlusion modeling, and poor small-object detection in 3D semantic occupancy prediction for autonomous driving, this paper proposes an efficient and robust camera-LiDAR fusion framework. Our method introduces three key innovations: (1) Hierarchical Voxel Feature Refinement (HVFR) to enhance dense LiDAR representation; (2) a Multi-scale Occupancy Decoder (MOD) incorporating an explicit βoccludedβ class to model occlusion relationships; and (3) a Pixel-Voxel Fusion Network (PVF-Net) leveraging deformable attention for high-precision cross-modal alignment. On nuScenes-Occupancy, our approach achieves +5.2% IoU and +5.3% mIoU over the baseline, while significantly reducing parameter count and FLOPs. Extensive evaluation on SemanticKITTI further demonstrates strong generalization capability across diverse scenes and sensor configurations.
π Abstract
Accurate 3D perception is essential for understanding the environment in autonomous driving. Recent advancements in 3D semantic occupancy prediction have leveraged camera-LiDAR fusion to improve robustness and accuracy. However, current methods allocate computational resources uniformly across all voxels, leading to inefficiency, and they also fail to adequately address occlusions, resulting in reduced accuracy in challenging scenarios. We propose MR-Occ, a novel approach for camera-LiDAR fusion-based 3D semantic occupancy prediction, addressing these challenges through three key components: Hierarchical Voxel Feature Refinement (HVFR), Multi-scale Occupancy Decoder (MOD), and Pixel to Voxel Fusion Network (PVF-Net). HVFR improves performance by enhancing features for critical voxels, reducing computational cost. MOD introduces an `occluded' class to better handle regions obscured from sensor view, improving accuracy. PVF-Net leverages densified LiDAR features to effectively fuse camera and LiDAR data through a deformable attention mechanism. Extensive experiments demonstrate that MR-Occ achieves state-of-the-art performance on the nuScenes-Occupancy dataset, surpassing previous approaches by +5.2% in IoU and +5.3% in mIoU while using fewer parameters and FLOPs. Moreover, MR-Occ demonstrates superior performance on the SemanticKITTI dataset, further validating its effectiveness and generalizability across diverse 3D semantic occupancy benchmarks.