Incorporating Scene Context and Semantic Labels for Enhanced Group-level Emotion Recognition

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing group-level emotion recognition (GER) methods neglect the role of visual scene context in modeling interpersonal relationships and fail to fully exploit the semantic richness of emotion labels. To address these limitations, we propose a cross-modal collaborative framework: first, multi-scale convolutional networks encode visual scenes to explicitly model spatial and semantic dependencies among individuals; second, a large language model generates a fine-grained emotion lexicon, and a structured emotion tree is constructed for semantic refinement; third, a similarity-aware interaction mechanism dynamically aligns visual features with hierarchical semantic representations. Extensive experiments on three benchmark GER datasets demonstrate that our method achieves state-of-the-art performance, significantly enhancing contextual awareness and semantic understanding depth.

Technology Category

Application Category

📝 Abstract
Group-level emotion recognition (GER) aims to identify holistic emotions within a scene involving multiple individuals. Current existed methods underestimate the importance of visual scene contextual information in modeling individual relationships. Furthermore, they overlook the crucial role of semantic information from emotional labels for complete understanding of emotions. To address this limitation, we propose a novel framework that incorporates visual scene context and label-guided semantic information to improve GER performance. It involves the visual context encoding module that leverages multi-scale scene information to diversely encode individual relationships. Complementarily, the emotion semantic encoding module utilizes group-level emotion labels to prompt a large language model to generate nuanced emotion lexicons. These lexicons, in conjunction with the emotion labels, are then subsequently refined into comprehensive semantic representations through the utilization of a structured emotion tree. Finally, similarity-aware interaction is proposed to align and integrate visual and semantic information, thereby generating enhanced group-level emotion representations and subsequently improving the performance of GER. Experiments on three widely adopted GER datasets demonstrate that our proposed method achieves competitive performance compared to state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Enhancing group emotion recognition through scene context integration
Incorporating semantic label information for comprehensive emotion understanding
Aligning visual and semantic representations for improved emotion classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages multi-scale scene context for relationship encoding
Uses emotion labels to generate nuanced semantic lexicons
Aligns visual and semantic information via similarity-aware interaction
🔎 Similar Papers
No similar papers found.