Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Existing wearable IMU-based human activity recognition (HAR) methods for movement disorders (e.g., Parkinson’s disease) suffer from poor out-of-distribution (OOD) generalization and heavy reliance on labeled data. Method: We propose an IMU-video cross-modal self-supervised pretraining framework that leverages large-scale unlabeled multimodal data. It employs cross-modal contrastive learning and spatiotemporal alignment to learn robust, disentangled motion representations—without requiring task-specific annotations. Contribution/Results: The framework significantly improves generalization to unseen environments and populations. Experiments demonstrate superior zero-shot and few-shot transfer performance on multiple OOD IMU benchmarks, outperforming both IMU-only and existing IMU-video pretraining approaches. By enabling accurate, continuous, and low-burden detection of abnormal movements from remote monitoring data, our method establishes a new paradigm for scalable, real-world deployment in digital health.

Technology Category

Application Category

📝 Abstract

Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring. In patients with movement disorders, the ability to detect abnormal patient movements in their home environments can enable continuous optimization of treatments and help alert caretakers as needed. Machine learning approaches have been proposed for HAR tasks using Inertial Measurement Unit (IMU) data; however, most rely on application-specific labels and lack generalizability to data collected in different environments or populations. To address this limitation, we propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data and demonstrate improved generalizability in HAR tasks on out of distribution (OOD) IMU datasets, including a dataset collected from patients with Parkinson's disease. Specifically, our results indicate that the proposed cross-modal pretraining approach outperforms the current state-of-the-art IMU-video pretraining approach and IMU-only pretraining under zero-shot and few-shot evaluations. Broadly, our study provides evidence that in highly dynamic data modalities, such as IMU signals, cross-modal pretraining may be a useful tool to learn generalizable data representations. Our software is available at https://github.com/scheshmi/IMU-Video-OOD-HAR.

Problem

Research questions and friction points this paper is trying to address.

Improving generalizability in Human Activity Recognition (HAR) tasks

Addressing limitations of IMU data in different environments or populations

Enhancing cross-modal pretraining for out-of-distribution (OOD) IMU datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal self-supervised pretraining for HAR

Leveraging unlabeled IMU-video data

Improved generalizability in OOD datasets

🔎 Similar Papers

C3T: Cross-modal Transfer Through Time for Human Action Recognition