ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Facial expression recognition (FER) suffers from poor generalization in real-world scenarios due to head pose variations, occlusions, illumination changes, and demographic diversity. To address this, we propose an end-to-end CNN framework integrating multi-scale feature extraction with a Mixture-of-Experts (MoE) mechanism. Our key innovation is a dynamic expert selection module that couples a residual backbone network with multi-scale feature fusion, enabling adaptive weighting and modeling of robust discriminative features. This design significantly enhances model invariance to complex nuisances and facilitates cross-dataset transferability. Extensive experiments demonstrate state-of-the-art performance: 84.29% accuracy on AffectNet, RAF-DB, and FER-2013 benchmarks—surpassing prior methods while ensuring strong generalization, reproducibility, and practical deployability.

Technology Category

Application Category

📝 Abstract
In many domains, including online education, healthcare, security, and human-computer interaction, facial emotion recognition (FER) is essential. Real-world FER is still difficult despite its significance because of some factors such as variable head positions, occlusions, illumination shifts, and demographic diversity. Engagement detection, which is essential for applications like virtual learning and customer services, is frequently challenging due to FER limitations by many current models. In this article, we propose ExpressNet-MoE, a novel hybrid deep learning model that blends both Convolution Neural Networks (CNNs) and Mixture of Experts (MoE) framework, to overcome the difficulties. Our model dynamically chooses the most pertinent expert networks, thus it aids in the generalization and providing flexibility to model across a wide variety of datasets. Our model improves on the accuracy of emotion recognition by utilizing multi-scale feature extraction to collect both global and local facial features. ExpressNet-MoE includes numerous CNN-based feature extractors, a MoE module for adaptive feature selection, and finally a residual network backbone for deep feature learning. To demonstrate efficacy of our proposed model we evaluated on several datasets, and compared with current state-of-the-art methods. Our model achieves accuracies of 74.77% on AffectNet (v7), 72.55% on AffectNet (v8), 84.29% on RAF-DB, and 64.66% on FER-2013. The results show how adaptive our model is and how it may be used to develop end-to-end emotion recognition systems in practical settings. Reproducible codes and results are made publicly accessible at https://github.com/DeeptimaanB/ExpressNet-MoE.
Problem

Research questions and friction points this paper is trying to address.

Overcoming real-world facial emotion recognition challenges like occlusions and illumination
Improving engagement detection accuracy for virtual learning applications
Enhancing emotion recognition generalization across diverse demographic datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN and Mixture of Experts framework
Dynamic expert network selection for generalization
Multi-scale feature extraction for facial details
🔎 Similar Papers
No similar papers found.
D
Deeptimaan Banerjee
Computer Science & Engineering, University of Colorado Denver, Colorado, CO 80234
P
Prateek Gothwal
Computer Science & Engineering, University of Colorado Denver, Colorado, CO 80234
Ashis Kumer Biswas
Ashis Kumer Biswas
University of Colorado Denver
Machine LearningDeep LearningBioinformatics