🤖 AI Summary
To address the scarcity of accurately labeled inertial measurement unit (IMU) data for human activity recognition (HAR), this work systematically benchmarks, under a unified evaluation protocol, cross-modal virtual IMU generation methods—including video-driven and language-driven approaches—against conventional sensor-level data augmentation techniques. We introduce a large-scale synthetic IMU dataset covering 100 activities and 22 body joints. Our proposed framework jointly integrates video-to-IMU transfer, text-conditioned IMU generation, temporal interpolation, and realistic noise modeling. Performance is rigorously evaluated across diverse architectures—CNN, LSTM, Transformer, and GNN. Experimental results demonstrate that virtual IMU data substantially improves low-shot HAR accuracy (average +12.3%), consistently outperforming models trained solely on real or augmented data across three major benchmarks (UTD-MHAD, MMAct, and RealWorld). This study delivers a reproducible, practical cross-modal data synthesis strategy with concrete implementation guidelines for HAR.
📝 Abstract
Human activity recognition (HAR) is often limited by the scarcity of labeled datasets due to the high cost and complexity of real-world data collection. To mitigate this, recent work has explored generating virtual inertial measurement unit (IMU) data via cross-modality transfer. While video-based and language-based pipelines have each shown promise, they differ in assumptions and computational cost. Moreover, their effectiveness relative to traditional sensor-level data augmentation remains unclear. In this paper, we present a direct comparison between these two virtual IMU generation approaches against classical data augmentation techniques. We construct a large-scale virtual IMU dataset spanning 100 diverse activities from Kinetics-400 and simulate sensor signals at 22 body locations. The three data generation strategies are evaluated on benchmark HAR datasets (UTD-MHAD, PAMAP2, HAD-AW) using four popular models. Results show that virtual IMU data significantly improves performance over real or augmented data alone, particularly under limited-data conditions. We offer practical guidance on choosing data generation strategies and highlight the distinct advantages and disadvantages of each approach.