Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of accuracy and robustness in 3D object detection within vehicle-infrastructure cooperative scenarios by proposing a bird’s-eye-view (BEV)-based multimodal fusion architecture. The framework unifies roadside multi-camera images with vehicle-to-infrastructure point cloud data, employing a CenterPoint detection head, generalized IoU regression loss, and an IoU-based quality re-ranking module. Notably, the study systematically evaluates the impact of training–testing data overlap on performance, revealing a significant metric inflation due to data leakage. On the Codabench public test set, the baseline model achieves 0.85 mAP; this improves to 0.89 mAP through oversampling overlapping frames, and further rises to 0.99 mAP when ground-truth labels from overlapping frames are leveraged in post-processing.

📝 Abstract

We describe a Camera and LiDAR fusion detector developed for the TUMTraf V2X cooperative 3D object detection track of the DriveX 2026 challenge. The detector fuses three roadside cameras with a fused infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space and predicts boxes through a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head. Trained on the provided train and validation splits, the model reaches a 3D mAP of 0.85 on the public Codabench test split. While iterating on the system, we observed that 44 of the 50 test frames are also present in the released train (40) and validation (4) splits with their labels. We therefore conducted two additional studies to quantify how this overlap affects the final score: (1) a finetuning run that oversamples the 44 overlapping frames, reaching 0.89 mAP, and (2) a post-processing run that replaces predictions on those frames with the released ground truth, reaching 0.99 mAP (uploaded to our Codabench account for testing but not published on the leaderboard). All three configurations and their per-class results are reported.

Problem

Research questions and friction points this paper is trying to address.

cooperative 3D object detection

Camera-LiDAR fusion

BEV

V2X

TUMTraf

Innovation

Methods, ideas, or system contributions that make the work stand out.

BEV fusion

Camera-LiDAR fusion

Cooperative 3D object detection