FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

๐Ÿ“… 2026-03-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of micro-gesture recognition, where subtle inter-class differences hinder effective modeling of local dynamic features under conventional class-level supervision. To overcome this limitation, the authors propose a fine-grained semantic-guided learning framework, introducing the first micro-gesture dataset annotated with four-dimensional fine-grained textual descriptions. They design a multi-level contrastive optimization strategy to enable coarse-to-fine joint training and incorporate two novel attention modulesโ€”Fine-Grained Semantic Attention (FG-SA) and Class-Level Prototype Attention (CP-A)โ€”to guide vision-language models toward discriminative local motion cues. Experimental results demonstrate that the proposed approach achieves competitive performance on micro-gesture recognition benchmarks, validating the efficacy of fine-grained semantic guidance in enhancing recognition accuracy.

Technology Category

Application Category

๐Ÿ“ Abstract
Micro-gesture recognition (MGR) is challenging due to subtle inter-class variations. Existing methods rely on category-level supervision, which is insufficient for capturing subtle and localized motion differences. Thus, this paper proposes a Fine-Grained Semantic Guidance Learning (FG-SGL) framework that jointly integrates fine-grained and category-level semantics to guide vision--language models in perceiving local MG motions. FG-SA adopts fine-grained semantic cues to guide the learning of local motion features, while CP-A enhances the separability of MG features through category-level semantic guidance. To support fine-grained semantic guidance, this work constructs a fine-grained textual dataset with human annotations that describes the dynamic process of MGs in four refined semantic dimensions. Furthermore, a Multi-Level Contrastive Optimization strategy is designed to jointly optimize both modules in a coarse-to-fine pattern. Experiments show that FG-SGL achieves competitive performance, validating the effectiveness of fine-grained semantic guidance for MGR.
Problem

Research questions and friction points this paper is trying to address.

micro-gesture recognition
fine-grained semantics
subtle motion differences
category-level supervision
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-Grained Semantic Guidance
Micro-Gesture Recognition
Vision-Language Model
Motion Process Decomposition
Multi-Level Contrastive Optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jinsheng Wei
Nanjing University of Posts and Telecommunications, China
Z
Zhaodi Xu
Nanjing University of Posts and Telecommunications, China
G
Guanming Lu
Nanjing University of Posts and Telecommunications, China
Haoyu Chen
Haoyu Chen
University of Oulu
Deep LearningComputer VisionHuman gesture3D GenerationEmotion AI
J
Jingjie Yan
Nanjing University of Posts and Telecommunications, China