Instance-Guided Radar Depth Estimation for 3D Object Detection

๐Ÿ“… 2026-01-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Monocular camera-based 3D object detection suffers from depth ambiguity and degraded performance under adverse environmental conditions. While radar offers robustness to such conditions, its sparse point clouds and low resolution hinder direct applicability to detection tasks. To address this, this work proposes InstaRadarโ€”an instance segmentation-guided radar point densification method that leverages semantic masks to enhance radar point density and align radar data with image semantics. Furthermore, the authors integrate a pre-trained RCDPT module into the BEVDepth framework, replacing its original depth estimation component to enable, for the first time, joint optimization with explicit depth supervision guided by radar. Experiments demonstrate that InstaRadar achieves state-of-the-art performance in radar-guided depth estimation and significantly improves 3D detection accuracy within the BEVDepth pipeline, validating the efficacy of radar-informed depth estimation.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate depth estimation is fundamental to 3D perception in autonomous driving, supporting tasks such as detection, tracking, and motion planning. However, monocular camera-based 3D detection suffers from depth ambiguity and reduced robustness under challenging conditions. Radar provides complementary advantages such as resilience to poor lighting and adverse weather, but its sparsity and low resolution limit its direct use in detection frameworks. This motivates the need for effective Radar-camera fusion with improved preprocessing and depth estimation strategies. We propose an end-to-end framework that enhances monocular 3D object detection through two key components. First, we introduce InstaRadar, an instance segmentation-guided expansion method that leverages pre-trained segmentation masks to enhance Radar density and semantic alignment, producing a more structured representation. InstaRadar achieves state-of-the-art results in Radar-guided depth estimation, showing its effectiveness in generating high-quality depth features. Second, we integrate the pre-trained RCDPT into the BEVDepth framework as a replacement for its depth module. With InstaRadar-enhanced inputs, the RCDPT integration consistently improves 3D detection performance. Overall, these components yield steady gains over the baseline BEVDepth model, demonstrating the effectiveness of InstaRadar and the advantage of explicit depth supervision in 3D object detection. Although the framework lags behind Radar-camera fusion models that directly extract BEV features, since Radar serves only as guidance rather than an independent feature stream, this limitation highlights potential for improvement. Future work will extend InstaRadar to point cloud-like representations and integrate a dedicated Radar branch with temporal cues for enhanced BEV fusion.
Problem

Research questions and friction points this paper is trying to address.

depth estimation
3D object detection
radar-camera fusion
monocular vision
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

InstaRadar
instance-guided radar expansion
radar-camera fusion
monocular 3D object detection
depth estimation
๐Ÿ”Ž Similar Papers
No similar papers found.