FedMultiEmo: Real-Time Emotion Recognition via Multimodal Federated Learning

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses three key challenges in in-vehicle emotion recognition: (1) modality fragility—visual degradation under varying illumination or occlusion; (2) inter-individual physiological variability—e.g., heterogeneity in heart rate and galvanic skin response; and (3) privacy risks—sensitive biometric data transmission to centralized servers. To this end, we propose a multimodal federated learning framework that fuses visual features (extracted via CNN from facial images) and physiological signals (classified using Random Forest) at the decision level. We introduce a personalized federated averaging algorithm to weight client-specific model updates and design a lightweight edge–cloud co-inference prototype system. Evaluated on FER2013 and a custom physiological dataset, the fused model achieves 87% accuracy—on par with centralized training—converges within 18 communication rounds, incurs an average per-round latency of 120 seconds, and requires <200 MB memory per client. To our knowledge, this is the first real-time in-vehicle emotion recognition system simultaneously ensuring robustness, personalization, and privacy preservation.

Technology Category

Application Category

📝 Abstract

In-vehicle emotion recognition underpins adaptive driver-assistance systems and, ultimately, occupant safety. However, practical deployment is hindered by (i) modality fragility - poor lighting and occlusions degrade vision-based methods; (ii) physiological variability - heart-rate and skin-conductance patterns differ across individuals; and (iii) privacy risk - centralized training requires transmission of sensitive data. To address these challenges, we present FedMultiEmo, a privacy-preserving framework that fuses two complementary modalities at the decision level: visual features extracted by a Convolutional Neural Network from facial images, and physiological cues (heart rate, electrodermal activity, and skin temperature) classified by a Random Forest. FedMultiEmo builds on three key elements: (1) a multimodal federated learning pipeline with majority-vote fusion, (2) an end-to-end edge-to-cloud prototype on Raspberry Pi clients and a Flower server, and (3) a personalized Federated Averaging scheme that weights client updates by local data volume. Evaluated on FER2013 and a custom physiological dataset, the federated Convolutional Neural Network attains 77% accuracy, the Random Forest 74%, and their fusion 87%, matching a centralized baseline while keeping all raw data local. The developed system converges in 18 rounds, with an average round time of 120 seconds and a per-client memory footprint below 200 MB. These results indicate that FedMultiEmo offers a practical approach to real-time, privacy-aware emotion recognition in automotive settings.

Problem

Research questions and friction points this paper is trying to address.

Overcoming modality fragility in vision-based emotion recognition

Addressing physiological variability across individuals

Ensuring privacy by avoiding centralized sensitive data transmission

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal federated learning with majority-vote fusion

Edge-to-cloud prototype on Raspberry Pi clients

Personalized Federated Averaging by data volume

🔎 Similar Papers

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition