🤖 AI Summary
Pure vision-based 3D occupancy prediction suffers from limited performance in complex scenes, while camera-radar fusion is hindered by radar sparsity and noise, leading to insufficient feature representation. To address these challenges, this paper proposes REOcc, a novel camera-radar fusion network. Its core innovation is the Radar Densification and Amplification (RDA) module, which employs learnable voxel-wise interpolation and context-aware enhancement to significantly improve radar feature density and robustness. Furthermore, a cross-modal feature interaction mechanism is introduced to enable fine-grained alignment and complementary fusion of image semantics and radar geometry. Evaluated on the Occ3D-nuScenes benchmark, REOcc substantially outperforms pure vision methods: notably, it achieves a 12.7% mIoU gain for dynamic objects (e.g., vehicles and pedestrians), demonstrating the effectiveness of its multimodal complementary modeling.
📝 Abstract
Vision-based 3D occupancy prediction has made significant advancements, but its reliance on cameras alone struggles in challenging environments. This limitation has driven the adoption of sensor fusion, among which camera-radar fusion stands out as a promising solution due to their complementary strengths. However, the sparsity and noise of the radar data limits its effectiveness, leading to suboptimal fusion performance. In this paper, we propose REOcc, a novel camera-radar fusion network designed to enrich radar feature representations for 3D occupancy prediction. Our approach introduces two main components, a Radar Densifier and a Radar Amplifier, which refine radar features by integrating spatial and contextual information, effectively enhancing spatial density and quality. Extensive experiments on the Occ3D-nuScenes benchmark demonstrate that REOcc achieves significant performance gains over the camera-only baseline model, particularly in dynamic object classes. These results underscore REOcc's capability to mitigate the sparsity and noise of the radar data. Consequently, radar complements camera data more effectively, unlocking the full potential of camera-radar fusion for robust and reliable 3D occupancy prediction.