Robust Fusion of Object-Level V2X for Learned 3D Object Detection

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the vulnerability of onboard perception under occlusion or low visibility, which compromises the reliability of autonomous driving. To mitigate this issue, the authors propose a bird’s-eye-view (BEV) representation method that fuses object-level vehicle-to-everything (V2X) information with multimodal onboard perception, building upon the BEVFusion architecture. The approach incorporates noise-aware training and an explicit confidence encoding mechanism to reduce reliance on idealized V2X data. Through object-level V2X simulation, controllable noise injection, and BEV feature fusion, the method achieves a 0.80 NuScenes Detection Score (NDS) on the nuScenes benchmark and demonstrates significantly enhanced robustness and performance in challenging scenarios involving communication latency, localization errors, and low V2X penetration rates.

📝 Abstract

Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by line-of-sight and field-of-view constraints. These inherent limitations may cause onboard perception to fail under occlusions or poor visibility conditions. In parallel, cooperative awareness via vehicle-to-everything (V2X) communication is becoming increasingly available, enabling vehicles and infrastructure to share their own state as object-level information that complements onboard perception. In this work, we study how such V2X information can be integrated into 3D object detection and how robust the resulting system is to realistic V2X imperfections. Using the nuScenes dataset, we emulate object-level cooperative awareness messages from ground truth, injecting controlled noise and object dropout to mimic real-world conditions such as latency, localization errors, and low V2X penetration rates. We convert these messages into a dedicated bird's-eye view (BEV) input and fuse them into a BEVFusion-style detector. Our results demonstrate that while object-level cooperative information can substantially improve detection performance, achieving an NDS of 0.80 under favorable conditions, models trained on idealized data become fragile and over-reliant on V2X. Conversely, our proposed noise-aware training strategy, coupled with explicit confidence encoding, enhances robustness, maintaining performance gains even under severe noise and reduced V2X penetration.

Problem

Research questions and friction points this paper is trying to address.

V2X

3D object detection

robust fusion

object-level perception

cooperative awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

object-level V2X

BEV fusion

noise-aware training