🤖 AI Summary
Existing micro-expression analysis methods suffer from rigid fixed-length temporal segmentation and hard decision-making, limiting their ability to capture fine-grained temporal dynamics. To address this, we propose a prior-guided video-level regression framework that eliminates hand-crafted windowing and instead employs a learnable interval selection strategy to adaptively localize onset, apex, and offset phases. We explicitly incorporate prior knowledge—including micro-expression evolution patterns, duration distributions, and class-specific statistics—to regularize temporal modeling. Furthermore, we design a shared-backbone co-optimization architecture that jointly learns detection and recognition, enhancing generalization under low-data regimes. Evaluated on CAS(ME)³ and SAMMLV, our method achieves STRS scores of 0.0562 and 0.2000, respectively—substantially outperforming state-of-the-art approaches and establishing new international benchmarks.
📝 Abstract
Micro-expressions (MEs) are involuntary, low-intensity, and short-duration facial expressions that often reveal an individual's genuine thoughts and emotions. Most existing ME analysis methods rely on window-level classification with fixed window sizes and hard decisions, which limits their ability to capture the complex temporal dynamics of MEs. Although recent approaches have adopted video-level regression frameworks to address some of these challenges, interval decoding still depends on manually predefined, window-based methods, leaving the issue only partially mitigated. In this paper, we propose a prior-guided video-level regression method for ME analysis. We introduce a scalable interval selection strategy that comprehensively considers the temporal evolution, duration, and class distribution characteristics of MEs, enabling precise spotting of the onset, apex, and offset phases. In addition, we introduce a synergistic optimization framework, in which the spotting and recognition tasks share parameters except for the classification heads. This fully exploits complementary information, makes more efficient use of limited data, and enhances the model's capability. Extensive experiments on multiple benchmark datasets demonstrate the state-of-the-art performance of our method, with an STRS of 0.0562 on CAS(ME)$^3$ and 0.2000 on SAMMLV. The code is available at https://github.com/zizheng-guo/BoostingVRME.