🤖 AI Summary
This work addresses the limitations of ground-based autonomous driving perception—particularly occlusion, blind spots, and limited sight range—which existing vehicle-to-vehicle or vehicle-to-infrastructure cooperative systems struggle to overcome in large-scale occlusion and long-range scenarios. To bridge this gap, we propose and release the first large-scale, real-world vehicle-to-UAV (V2U) cooperative perception dataset, integrating multi-view LiDAR and RGB cameras from both ground vehicles and aerial drones. The dataset comprises 56K synchronized multimodal frames collected across urban, campus, and rural road environments, annotated with 700K 3D bounding boxes. Leveraging this resource, we establish benchmarks for both single-agent and V2U cooperative 3D detection and tracking, demonstrating that cross-perspective collaboration significantly enhances perception robustness and long-range performance, thereby filling a critical data void in aerial-ground cooperative perception research.
📝 Abstract
Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range. While existing cooperative perception paradigms, such as Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I), have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground-level collaboration and cannot fully address large-scale occlusions or long-range perception in complex environments. To advance research in cross-view cooperative perception, we present V2U4Real, the first large-scale real-world multi-modal dataset for Vehicle-to-UAV (V2U) cooperative object perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi-view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi-view camera images, and 700K annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single-agent 3D object detection, cooperative 3D object detection, and object tracking. Comprehensive evaluations of several state-of-the-art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long-range awareness. The V2U4Real dataset and codebase is available at https://github.com/VjiaLi/V2U4Real.