MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-term (1–2 minute) multimodal emotion recognition faces challenges in modeling dynamic cross-modal interactions and effectively fusing long video sequences with multi-channel physiological signals (e.g., EDA, ECG/PPG). To address this, we propose MVP, a lightweight attention-driven video-physiology fusion architecture. MVP introduces the first unified deep learning framework integrating a dual-stream CNN-LSTM video encoder, a time-frequency feature extraction network for physiological signals, and a cross-modal alignment module with adaptive weighted fusion. Crucially, MVP enables end-to-end co-optimization of visual and multi-channel physiological representations, substantially enhancing long-sequence modeling capability. Evaluated on standard benchmarks, MVP achieves a 4.2–6.8% absolute accuracy improvement over state-of-the-art methods under the joint video+EDA+ECG/PPG modality. Comprehensive experiments further validate its robustness and generalizability across diverse subjects and recording conditions.

Technology Category

Application Category

📝 Abstract
Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differently then others approaches, MVP exploits the benefits of attention to enable the use of long input sequences (1-2 minutes). We have studied video and physiological backbones for inputting long sequences and evaluated our method with respect to the state-of-the-art. Our results show that MVP outperforms former methods for emotion recognition based on facial videos, EDA, and ECG/PPG.
Problem

Research questions and friction points this paper is trying to address.

Emotion Recognition
Integration of Behavioral and Physiological Responses
Deep Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

MVP System
Attention Mechanism
Emotion Recognition
🔎 Similar Papers
No similar papers found.
Valeriya Strizhkova
Valeriya Strizhkova
INRIA
Deep Learning
H
Hadi Kachmar
Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France
Hava Chaptoukaev
Hava Chaptoukaev
PhD student, Eurecom
R
Raphael Kalandadze
Georgian Technical University, Tbilisi, Georgia
Natia Kukhilava
Natia Kukhilava
Georgian Technical University
NLPLLMNeuroscience
Tatia Tsmindashvili
Tatia Tsmindashvili
Unknown affiliation
LLMNeuroscience
N
Nibras Abo-Alzahab
Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France
Maria A. Zuluaga
Maria A. Zuluaga
EURECOM
Michal Balazia
Michal Balazia
Research scientist, Centre de Recherche INRIA d'Universite Cote d'Azur
gait recognitionmotion capturescene understandingface uniquenessneurocognitive disorders
A
A. Dantcheva
Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France
F
François Brémond
Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France
L
Laura M. Ferrari
Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France; Scuola Superiore Sant’Anna, Pontedera, Italy