FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

πŸ“… 2026-03-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

193K/year
πŸ€– AI Summary
This work addresses the challenge of micro-gesture recognition, where subtle inter-class differences hinder effective modeling of local dynamic features under conventional class-level supervision. To overcome this limitation, the authors propose a fine-grained semantic-guided learning framework, introducing the first micro-gesture dataset annotated with four-dimensional fine-grained textual descriptions. They design a multi-level contrastive optimization strategy to enable coarse-to-fine joint training and incorporate two novel attention modulesβ€”Fine-Grained Semantic Attention (FG-SA) and Class-Level Prototype Attention (CP-A)β€”to guide vision-language models toward discriminative local motion cues. Experimental results demonstrate that the proposed approach achieves competitive performance on micro-gesture recognition benchmarks, validating the efficacy of fine-grained semantic guidance in enhancing recognition accuracy.

Technology Category

Application Category

πŸ“ Abstract
Micro-gesture recognition (MGR) is challenging due to subtle inter-class variations. Existing methods rely on category-level supervision, which is insufficient for capturing subtle and localized motion differences. Thus, this paper proposes a Fine-Grained Semantic Guidance Learning (FG-SGL) framework that jointly integrates fine-grained and category-level semantics to guide vision--language models in perceiving local MG motions. FG-SA adopts fine-grained semantic cues to guide the learning of local motion features, while CP-A enhances the separability of MG features through category-level semantic guidance. To support fine-grained semantic guidance, this work constructs a fine-grained textual dataset with human annotations that describes the dynamic process of MGs in four refined semantic dimensions. Furthermore, a Multi-Level Contrastive Optimization strategy is designed to jointly optimize both modules in a coarse-to-fine pattern. Experiments show that FG-SGL achieves competitive performance, validating the effectiveness of fine-grained semantic guidance for MGR.
Problem

Research questions and friction points this paper is trying to address.

micro-gesture recognition
fine-grained semantics
subtle motion differences
category-level supervision
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-Grained Semantic Guidance
Micro-Gesture Recognition
Vision-Language Model
Motion Process Decomposition
Multi-Level Contrastive Optimization
πŸ”Ž Similar Papers
No similar papers found.