🤖 AI Summary
Real-world IMU-based human activity recognition (HAR) faces two key challenges: the scarcity of comprehensive datasets covering diverse activities, and the lack of model interpretability. To address these, we propose ZS-ExplainHAR—the first self-explanatory zero-shot HAR framework—capable of recognizing unseen activities without target-label supervision while simultaneously generating semantically consistent skeleton videos as interpretable evidence. Our approach integrates zero-shot learning, cross-modal feature alignment, and an action generation network to enable end-to-end mapping from raw IMU signals to skeletal motion sequences. Evaluated on four benchmark datasets, ZS-ExplainHAR achieves recognition accuracy competitive with state-of-the-art black-box models (e.g., only 3% lower than the best on PAMAP2), while providing intuitive, realistic, and human-understandable visual explanations. To our knowledge, this is the first work to unify high recognition performance with strong, intrinsic interpretability in zero-shot HAR.
📝 Abstract
Human Activity Recognition (HAR), which uses data from Inertial Measurement Unit (IMU) sensors, has many practical applications in healthcare and assisted living environments. However, its use in real-world scenarios has been limited by the lack of comprehensive IMU-based HAR datasets that cover a wide range of activities and the lack of transparency in existing HAR models. Zero-shot HAR (ZS-HAR) overcomes the data limitations, but current models struggle to explain their decisions, making them less transparent. This paper introduces a novel IMU-based ZS-HAR model called the Self-Explainable Zero-shot Human Activity Recognition Network (SEZ-HARN). It can recognize activities not encountered during training and provide skeleton videos to explain its decision-making process. We evaluate the effectiveness of the proposed SEZ-HARN on four benchmark datasets PAMAP2, DaLiAc, HTD-MHAD and MHealth and compare its performance against three state-of-the-art black-box ZS-HAR models. The experiment results demonstrate that SEZ-HARN produces realistic and understandable explanations while achieving competitive Zero-shot recognition accuracy. SEZ-HARN achieves a Zero-shot prediction accuracy within 3% of the best-performing black-box model on PAMAP2 while maintaining comparable performance on the other three datasets.