🤖 AI Summary
This work addresses the multi-robot activity recognition (MRAR) problem by proposing the first multimodal sensing framework integrating WiFi channel state information (CSI), multi-view video, and multichannel audio. Unlike conventional vision-centric approaches, it innovatively adopts WiFi CSI as the primary non-intrusive sensing modality and introduces a tri-modal time-synchronized acquisition and calibration system to enable robust cross-modal modeling. To support reproducible research, we construct RoboMNIST—a high-quality, temporally aligned, and finely annotated multimodal benchmark dataset built on the Franka Emika dual-arm robot platform—the first of its kind for MRAR. Extensive experiments demonstrate that our method significantly improves activity recognition accuracy and cross-scenario generalization capability. This work establishes a new paradigm for multi-robot autonomous decision-making and environmental understanding through synergistic multimodal perception.
📝 Abstract
We introduce a novel dataset for multi-robot activity recognition (MRAR) using two robotic arms integrating WiFi channel state information (CSI), video, and audio data. This multimodal dataset utilizes signals of opportunity, leveraging existing WiFi infrastructure to provide detailed indoor environmental sensing without additional sensor deployment. Data were collected using two Franka Emika robotic arms, complemented by three cameras, three WiFi sniffers to collect CSI, and three microphones capturing distinct yet complementary audio data streams. The combination of CSI, visual, and auditory data can enhance robustness and accuracy in MRAR. This comprehensive dataset enables a holistic understanding of robotic environments, facilitating advanced autonomous operations that mimic human-like perception and interaction. By repurposing ubiquitous WiFi signals for environmental sensing, this dataset offers significant potential aiming to advance robotic perception and autonomous systems. It provides a valuable resource for developing sophisticated decision-making and adaptive capabilities in dynamic environments.