GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection

๐Ÿ“… 2024-03-18
๐Ÿ›๏ธ European Conference on Computer Vision
๐Ÿ“ˆ Citations: 8
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address misalignment of BEV features and depth estimation errors caused by LiDAR-camera calibration inaccuracies in multimodal 3D object detection, this paper proposes a dual-module correction framework: Local Align and Global Align. Local Align performs neighborhood-aware, graph-matching-based self-correction of depth estimates, explicitly modeling local geometric mismatches. Global Align mitigates global projection distortions by optimizing cross-modal alignment in the BEV feature space. This work is the first to explicitly model and compensate for geometric mismatches induced by calibration noise within the BEV fusion paradigm. On the nuScenes validation set, our method achieves a 70.1% mAPโ€”outperforming BEV Fusion by 1.6%. Under injected calibration noise, the performance gain widens to 8.3%, demonstrating significantly enhanced robustness. The framework thus bridges a critical gap between theoretical calibration assumptions and practical sensor deployment, improving both accuracy and reliability in real-world multimodal 3D perception.

Technology Category

Application Category

๐Ÿ“ Abstract
Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1%, surpassing BEV Fusion by 1.6% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3% under conditions with misalignment noise.
Problem

Research questions and friction points this paper is trying to address.

Inaccurate calibration between LiDAR and camera sensors.
Misalignment of LiDAR and camera BEV features.
Errors in depth estimation for camera branch.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph BEV framework for robust feature alignment
Local Align module with neighbor-aware depth features
Global Align module to correct LiDAR-camera misalignment
๐Ÿ”Ž Similar Papers
No similar papers found.
Ziying Song
Ziying Song
Beijing Jiaotong University
Object DetectionComputer VisionDeep Learning
L
Lei Yang
School of Vehicle and Mobility, Tsinghua University
Shaoqing Xu
Shaoqing Xu
University of Macau, BUAA, Xiaomi EV
3D Computer Vision3D GenerationVision and Language ModelEnd2EndWorld Model
L
Lin Liu
School of Computer and Information Technology, Beijing Jiaotong University
Dongyang Xu
Dongyang Xu
School of Vehicle and Mobility, Tsinghua University
C
Caiyan Jia
School of Computer and Information Technology, Beijing Jiaotong University
Feiyang Jia
Feiyang Jia
Beijing Jiaotong University
L
Li Wang
School of Mechanical Engineering, Beijing Institute of Technology