MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of micro-expression (ME) analysis in realistic long videos. To overcome the limitations of conventional single-frame classification, we propose an end-to-end framework that unifies ME spotting and recognition. We introduce two novel tasks: Micro-Expression Spotting-Then-Recognition (ME-STR) and Micro-Expression Temporal Visual Question Answering (ME-VQA), advancing ME understanding toward temporal modeling and multimodal semantic reasoning. Methodologically, we formulate spotting and recognition as a joint sequence prediction problem for the first time, integrating 3D-CNNs and Transformers for fine-grained action localization, emotion classification, and natural language question answering—enabled by LVLM/MLLM-driven cross-modal alignment and instruction tuning. We further establish the first standardized benchmark for ME analysis in long videos, accompanied by a public leaderboard. Our approach significantly improves detection accuracy and semantic interpretability, offering a new paradigm for implicit emotion recognition in high-stakes scenarios.

Technology Category

Application Category

📝 Abstract

Facial micro-expressions (MEs) are involuntary movements of the face that occur spontaneously when a person experiences an emotion but attempts to suppress or repress the facial expression, typically found in a high-stakes environment. In recent years, substantial advancements have been made in the areas of ME recognition, spotting, and generation. However, conventional approaches that treat spotting and recognition as separate tasks are suboptimal, particularly for analyzing long-duration videos in realistic settings. Concurrently, the emergence of multimodal large language models (MLLMs) and large vision-language models (LVLMs) offers promising new avenues for enhancing ME analysis through their powerful multimodal reasoning capabilities. The ME grand challenge (MEGC) 2025 introduces two tasks that reflect these evolving research directions: (1) ME spot-then-recognize (ME-STR), which integrates ME spotting and subsequent recognition in a unified sequential pipeline; and (2) ME visual question answering (ME-VQA), which explores ME understanding through visual question answering, leveraging MLLMs or LVLMs to address diverse question types related to MEs. All participating algorithms are required to run on this test set and submit their results on a leaderboard. More details are available at https://megc2025.github.io.

Problem

Research questions and friction points this paper is trying to address.

Integrate micro-expression spotting and recognition in videos

Enhance ME analysis using multimodal large language models

Explore ME understanding through visual question answering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified sequential ME spotting and recognition

Multimodal large language models for ME analysis

Visual question answering for ME understanding

🔎 Similar Papers

No similar papers found.

Authors to Follow