🤖 AI Summary
Robust group emotion recognition in unconstrained, multi-person scenarios remains challenging due to input uncertainty (e.g., occlusion, crowd density) and inconsistent individual predictions arising from coarse-grained group-level supervision only.
Method: We propose an uncertainty-aware learning framework that (i) introduces Gaussian random embedding to model the probabilistic distribution of individual emotions; (ii) designs an uncertainty-sensitive adaptive weighting mechanism for feature fusion; and (iii) integrates a tri-branch feature extractor (face/object/scene), random embedding representations, and targeted image augmentation.
Contribution/Results: Our method achieves significant improvements in accuracy and cross-domain generalization on three benchmark datasets. It effectively mitigates the adverse effects of label ambiguity and input noise, establishing a novel weakly supervised paradigm for group emotion analysis.
📝 Abstract
Group-level emotion recognition (GER) is an inseparable part of human behavior analysis, aiming to recognize an overall emotion in a multi-person scene. However, the existing methods are devoted to combing diverse emotion cues while ignoring the inherent uncertainties under unconstrained environments, such as congestion and occlusion occurring within a group. Additionally, since only group-level labels are available, inconsistent emotion predictions among individuals in one group can confuse the network. In this paper, we propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER. By explicitly modeling the uncertainty of each individual, we utilize stochastic embedding drawn from a Gaussian distribution instead of deterministic point embedding. This representation captures the probabilities of different emotions and generates diverse predictions through this stochasticity during the inference stage. Furthermore, uncertainty-sensitive scores are adaptively assigned as the fusion weights of individuals' face within each group. Moreover, we develop an image enhancement module to enhance the model's robustness against severe noise. The overall three-branch model, encompassing face, object, and scene component, is guided by a proportional-weighted fusion strategy and integrates the proposed uncertainty-aware method to produce the final group-level output. Experimental results demonstrate the effectiveness and generalization ability of our method across three widely used databases.