Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception

📅 2025-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of autonomous driving 3D perception under adverse weather and low-texture conditions, this paper proposes the first omnidirectional perception framework integrating multi-view cameras and 4D imaging radar, unifying 3D object detection and semantic occupancy prediction. Methodologically, we introduce a coarse-grained voxel query generator, a dual-branch spatiotemporal encoder, and a BEV–voxel cross-modal fusion module; further incorporating geometry-guided voxel initialization, Transformer-based refinement, parallel spatiotemporal modeling, and attention-driven feature alignment, all optimized via multi-task learning. Our approach achieves state-of-the-art performance on OmniHD-Scenes, VoD, and TJ4DRadSet, significantly improving both accuracy and robustness of 3D perception in challenging environments.

Technology Category

Application Category

📝 Abstract
3D object detection and occupancy prediction are critical tasks in autonomous driving, attracting significant attention. Despite the potential of recent vision-based methods, they encounter challenges under adverse conditions. Thus, integrating cameras with next-generation 4D imaging radar to achieve unified multi-task perception is highly significant, though research in this domain remains limited. In this paper, we propose Doracamom, the first framework that fuses multi-view cameras and 4D radar for joint 3D object detection and semantic occupancy prediction, enabling comprehensive environmental perception. Specifically, we introduce a novel Coarse Voxel Queries Generator that integrates geometric priors from 4D radar with semantic features from images to initialize voxel queries, establishing a robust foundation for subsequent Transformer-based refinement. To leverage temporal information, we design a Dual-Branch Temporal Encoder that processes multi-modal temporal features in parallel across BEV and voxel spaces, enabling comprehensive spatio-temporal representation learning. Furthermore, we propose a Cross-Modal BEV-Voxel Fusion module that adaptively fuses complementary features through attention mechanisms while employing auxiliary tasks to enhance feature quality. Extensive experiments on the OmniHD-Scenes, View-of-Delft (VoD), and TJ4DRadSet datasets demonstrate that Doracamom achieves state-of-the-art performance in both tasks, establishing new benchmarks for multi-modal 3D perception. Code and models will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Autonomous Vehicles
3D Object Recognition
Environmental Perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D Radar-Camera Fusion
Time Series Analysis
Multi-modal 3D Perception
🔎 Similar Papers
No similar papers found.
Lianqing Zheng
Lianqing Zheng
Tongji University Ph.D student
BEV/OCCVLA4D Radar PerceptionMultimodal FusionData Closed-Loop
Jianan Liu
Jianan Liu
Unknown affiliation
Signal ProcessingDeep LearningSensing and PerceptionAutonomous DrivingMedical Imaging
Runwei Guan
Runwei Guan
Hong Kong University of Science and Technology (Guangzhou) / Founder of FertiTech AI
Multi-Modal LearningUnmanned Surface VesselRadar PerceptionAI Medicine
L
Long Yang
School of Automotive Studies, Tongji University, Shanghai, China
Shouyi Lu
Shouyi Lu
TongJi University
AIGCImage EditingPoint Cloud Super-ResolutionPose EstimationPlace Recognition
Y
Yuanzhe Li
Chair of Automotive Engineering, Technische Universität Berlin, Berlin, Germany
Xiaokai Bai
Xiaokai Bai
Zhejiang University Ph.D student
Multimodal Fusion3D object detection4D Radar Perceptionautonomous driving
J
Jie Bai
School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, China
Z
Zhixiong Ma
School of Automotive Studies, Tongji University, Shanghai, China
H
Hui-Liang Shen
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
X
Xichan Zhu
School of Automotive Studies, Tongji University, Shanghai, China