CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation

πŸ“… 2025-09-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the degradation of camera-radar fusion performance in back-projection-based BEV transformation caused by image depth ambiguity, this paper proposes CRAB: a novel framework that (1) explicitly constrains image depth distributions using high-precision sparse depth priors from radar, thereby mitigating depth ambiguity during inverse projection; and (2) introduces a radar-context-enhanced cross-attention mechanism to achieve fine-grained alignment and fusion of image features with radar occupancy information directly in BEV space. CRAB jointly integrates inverse projection, view-specific feature aggregation, and spatially adaptive radar fusion into a single end-to-end trainable architecture for high-fidelity BEV representation learning. Evaluated on nuScenes, CRAB achieves 62.4% NDS and 54.0% mAPβ€”setting the new state of the art among back-projection-based camera-radar fusion methods for 3D detection and semantic segmentation.

Technology Category

Application Category

πŸ“ Abstract
Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to address the aforementioned limitations, we propose a novel camera-radar fusion-based 3D object detection and segmentation model named CRAB (Camera-Radar fusion for reducing depth Ambiguity in Backward projection-based view transformation), using a backward projection that leverages radar to mitigate depth ambiguity. During the view transformation, CRAB aggregates perspective view image context features into BEV queries. It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy. We further introduce spatial cross-attention with a feature map containing radar context information to enhance the comprehension of the 3D scene. When evaluated on the nuScenes open dataset, our proposed approach achieves a state-of-the-art performance among backward projection-based camera-radar fusion methods with 62.4% NDS and 54.0% mAP in 3D object detection.
Problem

Research questions and friction points this paper is trying to address.

Reduces depth ambiguity in backward projection view transformation
Improves 3D object detection using camera-radar fusion
Addresses sparse BEV features and false positive issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Camera-radar fusion using backward projection to reduce depth ambiguity
Combines dense image depth with sparse radar depth information
Introduces spatial cross-attention with radar context for 3D scenes
πŸ”Ž Similar Papers
No similar papers found.
I
In-Jae Lee
Interdisciplinary Program in Artificial intelligence, Seoul National University
S
Sihwan Hwang
Cho Chun Shik Graduate School of Mobility, KAIST
Y
Youngseok Kim
42dot Inc
W
Wonjune Kim
Electronics and Telecommunications Research Institute (ETRI)
S
Sanmin Kim
Department of Automobile and IT Convergence, Kookmin University
Dongsuk Kum
Dongsuk Kum
KAIST
Vehicle Dynamics & Control