V2U4Real: A Real-world Large-scale Dataset for Vehicle-to-UAV Cooperative Perception

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of ground-based autonomous driving perception—particularly occlusion, blind spots, and limited sight range—which existing vehicle-to-vehicle or vehicle-to-infrastructure cooperative systems struggle to overcome in large-scale occlusion and long-range scenarios. To bridge this gap, we propose and release the first large-scale, real-world vehicle-to-UAV (V2U) cooperative perception dataset, integrating multi-view LiDAR and RGB cameras from both ground vehicles and aerial drones. The dataset comprises 56K synchronized multimodal frames collected across urban, campus, and rural road environments, annotated with 700K 3D bounding boxes. Leveraging this resource, we establish benchmarks for both single-agent and V2U cooperative 3D detection and tracking, demonstrating that cross-perspective collaboration significantly enhances perception robustness and long-range performance, thereby filling a critical data void in aerial-ground cooperative perception research.

Technology Category

Application Category

📝 Abstract
Modern autonomous vehicle perception systems are often constrained by occlusions, blind spots, and limited sensing range. While existing cooperative perception paradigms, such as Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I), have demonstrated their effectiveness in mitigating these challenges, they remain limited to ground-level collaboration and cannot fully address large-scale occlusions or long-range perception in complex environments. To advance research in cross-view cooperative perception, we present V2U4Real, the first large-scale real-world multi-modal dataset for Vehicle-to-UAV (V2U) cooperative object perception. V2U4Real is collected by a ground vehicle and a UAV equipped with multi-view LiDARs and RGB cameras. The dataset covers urban streets, university campuses, and rural roads under diverse traffic scenarios, comprising over 56K LiDAR frames, 56K multi-view camera images, and 700K annotated 3D bounding boxes across four classes. To support a wide range of research tasks, we establish benchmarks for single-agent 3D object detection, cooperative 3D object detection, and object tracking. Comprehensive evaluations of several state-of-the-art models demonstrate the effectiveness of V2U cooperation in enhancing perception robustness and long-range awareness. The V2U4Real dataset and codebase is available at https://github.com/VjiaLi/V2U4Real.
Problem

Research questions and friction points this paper is trying to address.

occlusion
blind spots
limited sensing range
cooperative perception
long-range perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vehicle-to-UAV (V2U)
cooperative perception
large-scale real-world dataset
multi-modal sensing
3D object detection
🔎 Similar Papers
No similar papers found.
W
Weijia Li
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
H
Haoen Xiang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
T
Tianxu Wang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
S
Shuaibing Wu
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
Q
Qiming Xia
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
C
Cheng Wang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University, China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
Chenglu Wen
Chenglu Wen
Professor of Xiamen University
3D visionpoint cloudsmobile mappingrobotics