CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the degradation of camera-radar fusion performance in back-projection-based BEV transformation caused by image depth ambiguity, this paper proposes CRAB: a novel framework that (1) explicitly constrains image depth distributions using high-precision sparse depth priors from radar, thereby mitigating depth ambiguity during inverse projection; and (2) introduces a radar-context-enhanced cross-attention mechanism to achieve fine-grained alignment and fusion of image features with radar occupancy information directly in BEV space. CRAB jointly integrates inverse projection, view-specific feature aggregation, and spatially adaptive radar fusion into a single end-to-end trainable architecture for high-fidelity BEV representation learning. Evaluated on nuScenes, CRAB achieves 62.4% NDS and 54.0% mAP—setting the new state of the art among back-projection-based camera-radar fusion methods for 3D detection and semantic segmentation.

Technology Category

Application Category

📝 Abstract

Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to address the aforementioned limitations, we propose a novel camera-radar fusion-based 3D object detection and segmentation model named CRAB (Camera-Radar fusion for reducing depth Ambiguity in Backward projection-based view transformation), using a backward projection that leverages radar to mitigate depth ambiguity. During the view transformation, CRAB aggregates perspective view image context features into BEV queries. It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy. We further introduce spatial cross-attention with a feature map containing radar context information to enhance the comprehension of the 3D scene. When evaluated on the nuScenes open dataset, our proposed approach achieves a state-of-the-art performance among backward projection-based camera-radar fusion methods with 62.4% NDS and 54.0% mAP in 3D object detection.

Problem

Research questions and friction points this paper is trying to address.

Reduces depth ambiguity in backward projection view transformation

Improves 3D object detection using camera-radar fusion

Addresses sparse BEV features and false positive issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Camera-radar fusion using backward projection to reduce depth ambiguity

Combines dense image depth with sparse radar depth information

Introduces spatial cross-attention with radar context for 3D scenes

🔎 Similar Papers

Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception