🤖 AI Summary
Existing autonomous driving datasets lack sufficient diversity, coordination, and cross-domain support, limiting their utility for training multi-agent, multi-sensor systems. To address this gap, this work proposes a modular data generation pipeline built upon the AVstack framework and the CARLA simulator, capable of efficiently producing terabyte-scale, ground-truth-annotated multimodal data. The pipeline encompasses perspectives from ground vehicles, aerial platforms, and infrastructure sensors, and supports flexible single- or multi-agent configurations under controllable, complex scenarios. This approach represents the first scalable, cross-domain collaborative data generation methodology for autonomous driving, substantially enhancing the customization, training efficacy, and practical applicability of perception and sensor fusion models in cooperative autonomous systems.
📝 Abstract
Existing datasets cannot support large-scale learning in multi-agent, multi-sensor, or multi-domain autonomy, where diversity and coordination are essential. We present a modular dataset generation pipeline that creates terabyte-scale, ground-truth-labeled data for ground, aerial, and infrastructure-based systems using the AVstack framework and CARLA simulator. Supporting single- and multi-agent configurations with flexible sensor suites, the pipeline enables controllable experimentation across challenging conditions. Representative perception and fusion studies show how generated data can support application-specific training and collaborative autonomy.