Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

๐Ÿ“… 2025-05-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses unsupervised cross-domain visual emotion recognition (UCDVER), tackling the challenge of transferring emotion knowledge from a source domain (e.g., real-world images) to a target domain (e.g., stickers), where substantial discrepancies in emotional expression and severe domain distribution shift impede generalization. We propose the Knowledge-aligned Counterfactual-enhanced Diffusion-aware framework (KCDP). First, a shared cross-modal knowledge space is constructed using a vision-language model (VLM). Second, a counterfactual reasoningโ€“driven pseudo-labeling method, CLIEA, is introduced to improve target-domain label quality. Third, a diffusion model is integrated to enable fine-grained emotion perception and cross-domain alignment. Evaluated on the standard UCDVER benchmark, KCDP achieves a 12% absolute accuracy gain over the state-of-the-art TGCA-PVT, demonstrating significantly enhanced cross-domain generalization capability and emotion perception fidelity.

Technology Category

Application Category

๐Ÿ“ Abstract
Visual Emotion Recognition (VER) is a critical yet challenging task aimed at inferring emotional states of individuals based on visual cues. However, existing works focus on single domains, e.g., realistic images or stickers, limiting VER models' cross-domain generalizability. To fill this gap, we introduce an Unsupervised Cross-Domain Visual Emotion Recognition (UCDVER) task, which aims to generalize visual emotion recognition from the source domain (e.g., realistic images) to the low-resource target domain (e.g., stickers) in an unsupervised manner. Compared to the conventional unsupervised domain adaptation problems, UCDVER presents two key challenges: a significant emotional expression variability and an affective distribution shift. To mitigate these issues, we propose the Knowledge-aligned Counterfactual-enhancement Diffusion Perception (KCDP) framework. Specifically, KCDP leverages a VLM to align emotional representations in a shared knowledge space and guides diffusion models for improved visual affective perception. Furthermore, a Counterfactual-Enhanced Language-image Emotional Alignment (CLIEA) method generates high-quality pseudo-labels for the target domain. Extensive experiments demonstrate that our model surpasses SOTA models in both perceptibility and generalization, e.g., gaining 12% improvements over the SOTA VER model TGCA-PVT. The project page is at https://yinwen2019.github.io/ucdver.
Problem

Research questions and friction points this paper is trying to address.

Enhancing cross-domain visual emotion recognition without supervision
Addressing emotional expression variability and distribution shifts
Improving generalization from source to low-resource target domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM aligns emotional representations in shared space
Diffusion models enhance visual affective perception
Counterfactual method generates high-quality pseudo-labels
W
Wen Yin
The Laboratory of Intelligent Collaborative Computing of UESTC
Y
Yong Wang
The Laboratory of Intelligent Collaborative Computing of UESTC
G
Guiduo Duan
The Laboratory of Intelligent Collaborative Computing of UESTC, Ubiquitous Intelligence and Trusted Services Key Laboratory of Sichuan Province
Dongyang Zhang
Dongyang Zhang
University of Electronic Science and Technology of China
ๅ›พๅƒๅคๅŽŸใ€่ถ…ๅˆ†่พจ็އ
X
Xin Hu
The Laboratory of Intelligent Collaborative Computing of UESTC
Yuan-Fang Li
Yuan-Fang Li
Oracle | Monash University
Large language modelKnowledge graphsnatural language processing
T
Tao He
The Laboratory of Intelligent Collaborative Computing of UESTC, Ubiquitous Intelligence and Trusted Services Key Laboratory of Sichuan Province