Matching Skeleton-based Activity Representations with Heterogeneous Signals for HAR

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Effective alignment between skeletal representations and heterogeneous sensor signals (e.g., IMU, WiFi) remains challenging in human activity recognition (HAR). Method: This paper proposes a unified activity representation framework anchored on skeleton-based physical motion cues. Specifically: (1) a coarse-grained joint-angle self-supervised reconstruction task is designed to extract user- and deployment-agnostic motion knowledge; (2) a data-driven self-attention matching module dynamically emphasizes discriminative body parts; (3) the framework integrates self-supervised learning, skeletal sequence modeling, and multimodal alignment, augmented by synthetic data for improved generalization. Contributions/Results: The method achieves state-of-the-art performance under both full-shot and few-shot settings. We introduce MASD—the first time-synchronized tri-modal HAR dataset comprising synchronized IMU, WiFi, and skeletal sequences. Empirical evaluation validates the efficacy of synthetic skeleton data for cross-modal transfer learning.

Technology Category

Application Category

📝 Abstract
In human activity recognition (HAR), activity labels have typically been encoded in one-hot format, which has a recent shift towards using textual representations to provide contextual knowledge. Here, we argue that HAR should be anchored to physical motion data, as motion forms the basis of activity and applies effectively across sensing systems, whereas text is inherently limited. We propose SKELAR, a novel HAR framework that pretrains activity representations from skeleton data and matches them with heterogeneous HAR signals. Our method addresses two major challenges: (1) capturing core motion knowledge without context-specific details. We achieve this through a self-supervised coarse angle reconstruction task that recovers joint rotation angles, invariant to both users and deployments; (2) adapting the representations to downstream tasks with varying modalities and focuses. To address this, we introduce a self-attention matching module that dynamically prioritizes relevant body parts in a data-driven manner. Given the lack of corresponding labels in existing skeleton data, we establish MASD, a new HAR dataset with IMU, WiFi, and skeleton, collected from 20 subjects performing 27 activities. This is the first broadly applicable HAR dataset with time-synchronized data across three modalities. Experiments show that SKELAR achieves the state-of-the-art performance in both full-shot and few-shot settings. We also demonstrate that SKELAR can effectively leverage synthetic skeleton data to extend its use in scenarios without skeleton collections.
Problem

Research questions and friction points this paper is trying to address.

Anchoring HAR to physical motion data for cross-system effectiveness
Capturing core motion knowledge without context-specific details
Adapting representations to tasks with varying modalities and focuses
Innovation

Methods, ideas, or system contributions that make the work stand out.

SKELAR framework pretrains activity representations from skeleton data
Self-supervised coarse angle reconstruction task captures core motion knowledge
Self-attention matching module dynamically prioritizes relevant body parts
🔎 Similar Papers
No similar papers found.