Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing wearable IMU-based human activity recognition (HAR) methods for movement disorders (e.g., Parkinson’s disease) suffer from poor out-of-distribution (OOD) generalization and heavy reliance on labeled data. Method: We propose an IMU-video cross-modal self-supervised pretraining framework that leverages large-scale unlabeled multimodal data. It employs cross-modal contrastive learning and spatiotemporal alignment to learn robust, disentangled motion representations—without requiring task-specific annotations. Contribution/Results: The framework significantly improves generalization to unseen environments and populations. Experiments demonstrate superior zero-shot and few-shot transfer performance on multiple OOD IMU benchmarks, outperforming both IMU-only and existing IMU-video pretraining approaches. By enabling accurate, continuous, and low-burden detection of abnormal movements from remote monitoring data, our method establishes a new paradigm for scalable, real-world deployment in digital health.

Technology Category

Application Category

📝 Abstract
Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring. In patients with movement disorders, the ability to detect abnormal patient movements in their home environments can enable continuous optimization of treatments and help alert caretakers as needed. Machine learning approaches have been proposed for HAR tasks using Inertial Measurement Unit (IMU) data; however, most rely on application-specific labels and lack generalizability to data collected in different environments or populations. To address this limitation, we propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data and demonstrate improved generalizability in HAR tasks on out of distribution (OOD) IMU datasets, including a dataset collected from patients with Parkinson's disease. Specifically, our results indicate that the proposed cross-modal pretraining approach outperforms the current state-of-the-art IMU-video pretraining approach and IMU-only pretraining under zero-shot and few-shot evaluations. Broadly, our study provides evidence that in highly dynamic data modalities, such as IMU signals, cross-modal pretraining may be a useful tool to learn generalizable data representations. Our software is available at https://github.com/scheshmi/IMU-Video-OOD-HAR.
Problem

Research questions and friction points this paper is trying to address.

Improving generalizability in Human Activity Recognition (HAR) tasks
Addressing limitations of IMU data in different environments or populations
Enhancing cross-modal pretraining for out-of-distribution (OOD) IMU datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal self-supervised pretraining for HAR
Leveraging unlabeled IMU-video data
Improved generalizability in OOD datasets
🔎 Similar Papers
No similar papers found.
S
Seyyed Saeid Cheshmi
Department of Computer Science & Engineering, University of Minnesota
B
Buyao Lyu
Department of Mechanical Engineering, University of Minnesota
T
Thomas Lisko
Department of Neurosurgery, University of Minnesota
Rajesh Rajamani
Rajesh Rajamani
Professor, University of Minnesota
Estimationsensingand control systems
Robert A. McGovern
Robert A. McGovern
Assistant Professor of Neurosurgery, University of Minnesota
Yogatheesan Varatharajah
Yogatheesan Varatharajah
University of Minnesota Twin Cities
Healthcare AnalyticsMachine LearningTrustworthy AI