🤖 AI Summary
This work addresses the challenge of representation learning for human activity recognition from wrist-worn inertial measurement unit (IMU) signals under severe label scarcity. It introduces, for the first time, the submovement theory from motor control into wearable self-supervised learning, proposing biologically grounded movement segments as tokens. By segmenting and tokenizing acceleration signals according to submovement theory and pretraining a Transformer encoder via masked motion segment reconstruction, the method effectively captures the temporal structure of activities. Pretrained on the NHANES dataset, the approach outperforms existing wearable self-supervised methods across six user-disjoint benchmark tasks and demonstrates superior data efficiency in low-label regimes.
📝 Abstract
Wearable accelerometers have enabled large-scale health and wellness monitoring, yet learning robust human-activity representations has been constrained by the scarcity of labeled data. While self-supervised learning offers a potential remedy, existing approaches treat sensor streams as unstructured time series, overlooking the underlying biological structure of human movement, a factor we argue is critical for effective Human Activity Recognition (HAR). We introduce a novel tokenization strategy grounded in the submovement theory of motor control, which posits that continuous wrist motion is composed of superposed elementary basis functions called submovements. We define our token as the movement segment, a unit of motion composed of a finite sequence of submovements that is readily extractable from wrist accelerometer signals. By treating these segments as tokens, we pretrain a Transformer encoder via masked movement-segment reconstruction to model the temporal dependencies of movement segments, shifting the learning focus beyond local waveform morphology. Pretrained on the NHANES corpus (approximately 28k hours; approximately 11k participants; approximately 10M windows), our representations outperform strong wearable SSL baselines across six subject-disjoint HAR benchmarks. Furthermore, they demonstrate stronger data efficiency in data-scarce settings. Code and pretrained weights will be made publicly available.