Action Recognition Using Temporal Shift Module and Ensemble Learning

📅 2025-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging task of recognizing 20 complex human actions in middle-school educational scenarios. To this end, we propose a Temporal Shift Module (TSM)-enhanced multimodal video understanding framework. It is the first to adapt TSM to multimodal action recognition by jointly processing RGB frames, optical flow, and pose sequences. We introduce cross-modal feature alignment and weighted fusion strategies to strengthen temporal modeling and improve decision robustness. Furthermore, transfer learning and backbone network fine-tuning are employed to enhance generalization. Evaluated on the ICPR 2024 Multimodal Visual Pattern Recognition Challenge, our method achieves first place with 100% top-1 accuracy on the test set. This result demonstrates its effectiveness and state-of-the-art performance for lightweight, education-oriented action recognition tasks.

Technology Category

Application Category

📝 Abstract
This paper presents the first-rank solution for the Multi-Modal Action Recognition Challenge, part of the Multi-Modal Visual Pattern Recognition Workshop at the acl{ICPR} 2024. The competition aimed to recognize human actions using a diverse dataset of 20 action classes, collected from multi-modal sources. The proposed approach is built upon the acl{TSM}, a technique aimed at efficiently capturing temporal dynamics in video data, incorporating multiple data input types. Our strategy included transfer learning to leverage pre-trained models, followed by meticulous fine-tuning on the challenge's specific dataset to optimize performance for the 20 action classes. We carefully selected a backbone network to balance computational efficiency and recognition accuracy and further refined the model using an ensemble technique that integrates outputs from different modalities. This ensemble approach proved crucial in boosting the overall performance. Our solution achieved a perfect top-1 accuracy on the test set, demonstrating the effectiveness of the proposed approach in recognizing human actions across 20 classes. Our code is available online https://github.com/ffyyytt/TSM-MMVPR.
Problem

Research questions and friction points this paper is trying to address.

Time Shift Module
Action Recognition
Educational Application
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time Shift Module
Transfer Learning
Multi-source Information Fusion
🔎 Similar Papers
No similar papers found.