🤖 AI Summary
This work addresses three key challenges in in-vehicle emotion recognition: (1) modality fragility—visual degradation under varying illumination or occlusion; (2) inter-individual physiological variability—e.g., heterogeneity in heart rate and galvanic skin response; and (3) privacy risks—sensitive biometric data transmission to centralized servers. To this end, we propose a multimodal federated learning framework that fuses visual features (extracted via CNN from facial images) and physiological signals (classified using Random Forest) at the decision level. We introduce a personalized federated averaging algorithm to weight client-specific model updates and design a lightweight edge–cloud co-inference prototype system. Evaluated on FER2013 and a custom physiological dataset, the fused model achieves 87% accuracy—on par with centralized training—converges within 18 communication rounds, incurs an average per-round latency of 120 seconds, and requires <200 MB memory per client. To our knowledge, this is the first real-time in-vehicle emotion recognition system simultaneously ensuring robustness, personalization, and privacy preservation.
📝 Abstract
In-vehicle emotion recognition underpins adaptive driver-assistance systems and, ultimately, occupant safety. However, practical deployment is hindered by (i) modality fragility - poor lighting and occlusions degrade vision-based methods; (ii) physiological variability - heart-rate and skin-conductance patterns differ across individuals; and (iii) privacy risk - centralized training requires transmission of sensitive data. To address these challenges, we present FedMultiEmo, a privacy-preserving framework that fuses two complementary modalities at the decision level: visual features extracted by a Convolutional Neural Network from facial images, and physiological cues (heart rate, electrodermal activity, and skin temperature) classified by a Random Forest. FedMultiEmo builds on three key elements: (1) a multimodal federated learning pipeline with majority-vote fusion, (2) an end-to-end edge-to-cloud prototype on Raspberry Pi clients and a Flower server, and (3) a personalized Federated Averaging scheme that weights client updates by local data volume. Evaluated on FER2013 and a custom physiological dataset, the federated Convolutional Neural Network attains 77% accuracy, the Random Forest 74%, and their fusion 87%, matching a centralized baseline while keeping all raw data local. The developed system converges in 18 rounds, with an average round time of 120 seconds and a per-client memory footprint below 200 MB. These results indicate that FedMultiEmo offers a practical approach to real-time, privacy-aware emotion recognition in automotive settings.