Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of accurately labeled inertial measurement unit (IMU) data for human activity recognition (HAR), this work systematically benchmarks, under a unified evaluation protocol, cross-modal virtual IMU generation methods—including video-driven and language-driven approaches—against conventional sensor-level data augmentation techniques. We introduce a large-scale synthetic IMU dataset covering 100 activities and 22 body joints. Our proposed framework jointly integrates video-to-IMU transfer, text-conditioned IMU generation, temporal interpolation, and realistic noise modeling. Performance is rigorously evaluated across diverse architectures—CNN, LSTM, Transformer, and GNN. Experimental results demonstrate that virtual IMU data substantially improves low-shot HAR accuracy (average +12.3%), consistently outperforming models trained solely on real or augmented data across three major benchmarks (UTD-MHAD, MMAct, and RealWorld). This study delivers a reproducible, practical cross-modal data synthesis strategy with concrete implementation guidelines for HAR.

Technology Category

Application Category

📝 Abstract
Human activity recognition (HAR) is often limited by the scarcity of labeled datasets due to the high cost and complexity of real-world data collection. To mitigate this, recent work has explored generating virtual inertial measurement unit (IMU) data via cross-modality transfer. While video-based and language-based pipelines have each shown promise, they differ in assumptions and computational cost. Moreover, their effectiveness relative to traditional sensor-level data augmentation remains unclear. In this paper, we present a direct comparison between these two virtual IMU generation approaches against classical data augmentation techniques. We construct a large-scale virtual IMU dataset spanning 100 diverse activities from Kinetics-400 and simulate sensor signals at 22 body locations. The three data generation strategies are evaluated on benchmark HAR datasets (UTD-MHAD, PAMAP2, HAD-AW) using four popular models. Results show that virtual IMU data significantly improves performance over real or augmented data alone, particularly under limited-data conditions. We offer practical guidance on choosing data generation strategies and highlight the distinct advantages and disadvantages of each approach.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of labeled HAR datasets via synthetic data generation
Comparing video-based and language-based virtual IMU generation methods
Evaluating effectiveness against traditional sensor-level data augmentation techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates virtual IMU data via cross-modality transfer
Compares video-based and language-based synthetic pipelines
Evaluates against classical sensor-level data augmentation
🔎 Similar Papers
No similar papers found.
Zikang Leng
Zikang Leng
Georgia Institute of Technology
machine learningcomputer visionhuman activity recognitiontime series data
A
Archith Iyer
Georgia Institute of Technology, Atlanta, Georgia, USA
T
T. Plotz
Georgia Institute of Technology, Atlanta, Georgia, USA