Online Micro-gesture Recognition Using Data Augmentation and Spatial-Temporal Attention

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the online recognition of multi-instance micro-gestures in untrimmed videos, aiming to precisely localize their spatiotemporal boundaries and discriminate fine-grained categories. The task is challenging due to the temporal density of micro-gestures, ambiguous spatiotemporal boundaries, and minimal inter-class discriminability. To tackle these challenges, we propose an end-to-end framework integrating handcrafted data augmentation with spatiotemporal attention mechanisms: motion-sensitive augmentation enhances modeling of subtle dynamic changes, while a joint spatial-temporal attention module enables focal region selection and discriminative temporal dynamics learning. Evaluated on the IJCAI 2025 MiGA Challenge, our method achieves a state-of-the-art F1-score of 38.03—surpassing the prior best by 37.9%—demonstrating significant improvements in both accuracy and robustness for online micro-gesture detection.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce the latest solution developed by our team, HFUT-VUT, for the Micro-gesture Online Recognition track of the IJCAI 2025 MiGA Challenge. The Micro-gesture Online Recognition task is a highly challenging problem that aims to locate the temporal positions and recognize the categories of multiple micro-gesture instances in untrimmed videos. Compared to traditional temporal action detection, this task places greater emphasis on distinguishing between micro-gesture categories and precisely identifying the start and end times of each instance. Moreover, micro-gestures are typically spontaneous human actions, with greater differences than those found in other human actions. To address these challenges, we propose hand-crafted data augmentation and spatial-temporal attention to enhance the model's ability to classify and localize micro-gestures more accurately. Our solution achieved an F1 score of 38.03, outperforming the previous state-of-the-art by 37.9%. As a result, our method ranked first in the Micro-gesture Online Recognition track.
Problem

Research questions and friction points this paper is trying to address.

Locate and recognize micro-gestures in untrimmed videos
Distinguish micro-gesture categories with precise timing
Handle spontaneous human actions with high variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hand-crafted data augmentation for micro-gestures
Spatial-temporal attention for precise localization
Enhanced classification and localization accuracy
P
Pengyu Liu
School of Computer Science and Information Engineering, School of Artificial Intelligence, Hefei University of Technology (HFUT)
K
Kun Li
ReLER, CCAI, Zhejiang University, China
F
Fei Wang
School of Computer Science and Information Engineering, School of Artificial Intelligence, Hefei University of Technology (HFUT)
Yanyan Wei
Yanyan Wei
Hefei University of Technology (HFUT)
Robust Image PerceptionLLMAI Agent
J
Junhui She
Key Laboratory of Knowledge Engineering with Big Data (HFUT), Ministry of Education
Dan Guo
Dan Guo
IEEE senior member, Professor, Hefei University of Technology
Multimedia ComputingArtificial Intelligence