🤖 AI Summary
To address the insufficient robustness of human activity recognition in human-robot collaboration, this paper systematically compares fifteen daily-action recognition methods across three categories: IMU-based data gloves, vision-based tactile sensors, and their fusion. We propose, for the first time, a tactile–kinematic multimodal feature-level fusion framework that integrates single- or dual-stream visual-tactile feature extraction, IMU-based temporal modeling (LSTM/Transformer), and an online continuous-sequence processing mechanism. Experiments demonstrate that the framework achieves 92.3% accuracy in offline classification and improves F1-score by 8.3 percentage points over the best unimodal baseline in online continuous action recognition—significantly enhancing generalizability and real-time robustness. Our core contributions are: (1) empirical validation of the complementary nature of visual-tactile and kinematic modalities, and (2) establishment of the first end-to-end multimodal recognition paradigm explicitly designed for continuous human–robot interaction.
📝 Abstract
Human activity recognition (HAR) is essential for effective Human-Robot Collaboration (HRC), enabling robots to interpret and respond to human actions. This study evaluates the ability of a vision-based tactile sensor to classify 15 activities, comparing its performance to an IMU-based data glove. Additionally, we propose a multi-modal framework combining tactile and motion data to leverage their complementary strengths. We examined three approaches: motion-based classification (MBC) using IMU data, tactile-based classification (TBC) with single or dual video streams, and multi-modal classification (MMC) integrating both. Offline validation on segmented datasets assessed each configuration's accuracy under controlled conditions, while online validation on continuous action sequences tested online performance. Results showed the multi-modal approach consistently outperformed single-modality methods, highlighting the potential of integrating tactile and motion sensing to enhance HAR systems for collaborative robotics.