Enhanced Cooperative Perception Through Asynchronous Vehicle to Infrastructure Framework with Delay Mitigation for Connected and Automated Vehicles

📅 2025-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address 3D perception deficiencies in autonomous vehicles at high-speed, complex intersections—caused by sensor blind spots—this paper proposes an asynchronous vehicle-infrastructure cooperative perception framework leveraging roadside monocular cameras. The method introduces the first asynchronous late-fusion V2I architecture specifically designed for monocular traffic cameras, integrating an end-to-end latency compensation mechanism and an asynchronous Kalman filter fusion strategy. This design eliminates reliance on LiDAR/RADAR and relaxes stringent time synchronization requirements. Evaluated on Waymo’s representative intersection scenarios in simulation, the framework achieves a 32% increase in perception range, a 41% improvement in blind-spot object detection rate, and a 1.8-second reduction in response latency, compared to baseline methods. These results demonstrate substantial enhancements in real-time performance and safety under challenging intersection conditions.

Technology Category

Application Category

📝 Abstract
Perception is a key component of Automated vehicles (AVs). However, sensors mounted to the AVs often encounter blind spots due to obstructions from other vehicles, infrastructure, or objects in the surrounding area. While recent advancements in planning and control algorithms help AVs react to sudden object appearances from blind spots at low speeds and less complex scenarios, challenges remain at high speeds and complex intersections. Vehicle to Infrastructure (V2I) technology promises to enhance scene representation for AVs in complex intersections, providing sufficient time and distance to react to adversary vehicles violating traffic rules. Most existing methods for infrastructure-based vehicle detection and tracking rely on LIDAR, RADAR or sensor fusion methods, such as LIDAR-Camera and RADAR-Camera. Although LIDAR and RADAR provide accurate spatial information, the sparsity of point cloud data limits its ability to capture detailed object contours of objects far away, resulting in inaccurate 3D object detection results. Furthermore, the absence of LIDAR or RADAR at every intersection increases the cost of implementing V2I technology. To address these challenges, this paper proposes a V2I framework that utilizes monocular traffic cameras at road intersections to detect 3D objects. The results from the roadside unit (RSU) are then combined with the on-board system using an asynchronous late fusion method to enhance scene representation. Additionally, the proposed framework provides a time delay compensation module to compensate for the processing and transmission delay from the RSU. Lastly, the V2I framework is tested by simulating and validating a scenario similar to the one described in an industry report by Waymo. The results show that the proposed method improves the scene representation and the AV's perception range, giving enough time and space to react to adversary vehicles.
Problem

Research questions and friction points this paper is trying to address.

Enhance AV perception in blind spots using V2I technology
Reduce reliance on expensive LIDAR/RADAR with monocular cameras
Compensate delays in RSU processing for accurate 3D detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular traffic cameras for 3D object detection
Asynchronous late fusion with on-board system
Time delay compensation for processing delays
🔎 Similar Papers
No similar papers found.