Scaling Datasets for Multi-Sensor, Multi-Agent, and Multi-Domain Learning in Autonomous Systems

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing autonomous driving datasets lack sufficient diversity, coordination, and cross-domain support, limiting their utility for training multi-agent, multi-sensor systems. To address this gap, this work proposes a modular data generation pipeline built upon the AVstack framework and the CARLA simulator, capable of efficiently producing terabyte-scale, ground-truth-annotated multimodal data. The pipeline encompasses perspectives from ground vehicles, aerial platforms, and infrastructure sensors, and supports flexible single- or multi-agent configurations under controllable, complex scenarios. This approach represents the first scalable, cross-domain collaborative data generation methodology for autonomous driving, substantially enhancing the customization, training efficacy, and practical applicability of perception and sensor fusion models in cooperative autonomous systems.

📝 Abstract

Existing datasets cannot support large-scale learning in multi-agent, multi-sensor, or multi-domain autonomy, where diversity and coordination are essential. We present a modular dataset generation pipeline that creates terabyte-scale, ground-truth-labeled data for ground, aerial, and infrastructure-based systems using the AVstack framework and CARLA simulator. Supporting single- and multi-agent configurations with flexible sensor suites, the pipeline enables controllable experimentation across challenging conditions. Representative perception and fusion studies show how generated data can support application-specific training and collaborative autonomy.

Problem

Research questions and friction points this paper is trying to address.

multi-agent

multi-sensor

multi-domain

autonomous systems

large-scale learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent

multi-sensor

dataset generation