🤖 AI Summary
To address insufficient modeling of subtle motion variations in micro-action recognition, this paper proposes a Motion-Guided Modulation Network (MMN) that explicitly captures dynamic distinctions via dual-path motion cue modulation—operating at both skeletal and frame levels. Key contributions include: (1) a Motion-Sensitive Skeleton Modulation module (MSM) that injects motion awareness at the joint level; (2) a Temporal Modulation module (MTM) to enhance discriminability of inter-frame sub-pixel displacements; and (3) a multi-scale motion consistency learning strategy that jointly optimizes local and global motion representations. By integrating skeleton sequence modeling, spatiotemporal attention, and feature modulation mechanisms, MMN achieves fine-grained capture of sub-pixel dynamic changes. Extensive experiments demonstrate state-of-the-art performance on Micro-Action 52 and iMiGUE benchmarks, validating the critical efficacy of motion-guided modulation for micro-action recognition.
📝 Abstract
Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs, which limits the accuracy of distinguishing MAs with subtle changes. To address this issue, we present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues to enhance spatial-temporal representation learning. Specifically, we introduce a Motion-guided Skeletal Modulation module (MSM) to inject motion cues at the skeletal level, acting as a control signal to guide spatial representation modeling. In parallel, we design a Motion-guided Temporal Modulation module (MTM) to incorporate motion information at the frame level, facilitating the modeling of holistic motion patterns in micro-actions. Finally, we propose a motion consistency learning strategy to aggregate the motion cues from multi-scale features for micro-action classification. Experimental results on the Micro-Action 52 and iMiGUE datasets demonstrate that MMN achieves state-of-the-art performance in skeleton-based micro-action recognition, underscoring the importance of explicitly modeling subtle motion cues. The code will be available at https://github.com/momiji-bit/MMN.