Temporal and Spatial Feature Fusion Framework for Dynamic Micro Expression Recognition

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Micro-expression recognition remains challenging due to extremely short durations (<500 ms) and highly localized facial movements, resulting in long-standing performance saturation near 50% accuracy—even for expert annotators. To address this, we propose TSFmicro, a novel framework featuring a parallel spatiotemporal fusion mechanism that explicitly models the semantic complementarity between spatial location (“where”) and motion pattern (“how”) within a high-dimensional feature space. TSFmicro synergistically integrates RetNet’s capability for long-range temporal modeling with Transformer’s multi-scale spatiotemporal interaction, enabling joint representation learning and multimodal feature fusion for dynamic micro-expressions. Evaluated on three benchmark datasets—CASME II, SAMM, and MMEW—TSFmicro achieves significant improvements over existing state-of-the-art methods, boosting average accuracy by 6.2–9.8 percentage points and, for the first time, systematically surpassing the 50% recognition barrier.

Technology Category

Application Category

📝 Abstract
When emotions are repressed, an individual's true feelings may be revealed through micro-expressions. Consequently, micro-expressions are regarded as a genuine source of insight into an individual's authentic emotions. However, the transient and highly localised nature of micro-expressions poses a significant challenge to their accurate recognition, with the accuracy rate of micro-expression recognition being as low as 50%, even for professionals. In order to address these challenges, it is necessary to explore the field of dynamic micro expression recognition (DMER) using multimodal fusion techniques, with special attention to the diverse fusion of temporal and spatial modal features. In this paper, we propose a novel Temporal and Spatial feature Fusion framework for DMER (TSFmicro). This framework integrates a Retention Network (RetNet) and a transformer-based DMER network, with the objective of efficient micro-expression recognition through the capture and fusion of temporal and spatial relations. Meanwhile, we propose a novel parallel time-space fusion method from the perspective of modal fusion, which fuses spatio-temporal information in high-dimensional feature space, resulting in complementary"where-how"relationships at the semantic level and providing richer semantic information for the model. The experimental results demonstrate the superior performance of the TSFmicro method in comparison to other contemporary state-of-the-art methods. This is evidenced by its effectiveness on three well-recognised micro-expression datasets.
Problem

Research questions and friction points this paper is trying to address.

Recognizing dynamic micro-expressions with low accuracy rates
Fusing temporal and spatial features for better recognition
Improving DMER using multimodal fusion techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses temporal and spatial features for DMER
Uses Retention Network and transformer-based network
Parallel time-space fusion in high-dimensional space
🔎 Similar Papers
No similar papers found.