RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

📅 2024-08-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the multi-robot activity recognition (MRAR) problem by proposing the first multimodal sensing framework integrating WiFi channel state information (CSI), multi-view video, and multichannel audio. Unlike conventional vision-centric approaches, it innovatively adopts WiFi CSI as the primary non-intrusive sensing modality and introduces a tri-modal time-synchronized acquisition and calibration system to enable robust cross-modal modeling. To support reproducible research, we construct RoboMNIST—a high-quality, temporally aligned, and finely annotated multimodal benchmark dataset built on the Franka Emika dual-arm robot platform—the first of its kind for MRAR. Extensive experiments demonstrate that our method significantly improves activity recognition accuracy and cross-scenario generalization capability. This work establishes a new paradigm for multi-robot autonomous decision-making and environmental understanding through synergistic multimodal perception.

Technology Category

Application Category

📝 Abstract
We introduce a novel dataset for multi-robot activity recognition (MRAR) using two robotic arms integrating WiFi channel state information (CSI), video, and audio data. This multimodal dataset utilizes signals of opportunity, leveraging existing WiFi infrastructure to provide detailed indoor environmental sensing without additional sensor deployment. Data were collected using two Franka Emika robotic arms, complemented by three cameras, three WiFi sniffers to collect CSI, and three microphones capturing distinct yet complementary audio data streams. The combination of CSI, visual, and auditory data can enhance robustness and accuracy in MRAR. This comprehensive dataset enables a holistic understanding of robotic environments, facilitating advanced autonomous operations that mimic human-like perception and interaction. By repurposing ubiquitous WiFi signals for environmental sensing, this dataset offers significant potential aiming to advance robotic perception and autonomous systems. It provides a valuable resource for developing sophisticated decision-making and adaptive capabilities in dynamic environments.
Problem

Research questions and friction points this paper is trying to address.

Multi-robot activity recognition using multimodal data
Leveraging WiFi CSI for environmental sensing
Enhancing robotic perception with audio and video integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset with WiFi CSI
Integrates video and audio data
Enhances robot activity recognition robustness
🔎 Similar Papers
No similar papers found.
K
Kian Behzad
Department of Electrical & Computer Engineering, Northeastern University, Boston, MA, USA
R
Rojin Zandi
Department of Electrical & Computer Engineering, Northeastern University, Boston, MA, USA
E
Elaheh Motamedi
Department of Electrical & Computer Engineering, Northeastern University, Boston, MA, USA
Hojjat Salehinejad
Hojjat Salehinejad
Mayo Clinic | University of Toronto
Machine LearningStatistical Signal ProcessingWireless SensingAI in Healthcare
Milad Siami
Milad Siami
Associate Professor of ECE, Northeastern University
Multi-agent systemsNetwork sciencePerception and roboticsSystems and controlDistributed