Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses unsupervised cross-domain visual emotion recognition (UCDVER), tackling the challenge of transferring emotion knowledge from a source domain (e.g., real-world images) to a target domain (e.g., stickers), where substantial discrepancies in emotional expression and severe domain distribution shift impede generalization. We propose the Knowledge-aligned Counterfactual-enhanced Diffusion-aware framework (KCDP). First, a shared cross-modal knowledge space is constructed using a vision-language model (VLM). Second, a counterfactual reasoning–driven pseudo-labeling method, CLIEA, is introduced to improve target-domain label quality. Third, a diffusion model is integrated to enable fine-grained emotion perception and cross-domain alignment. Evaluated on the standard UCDVER benchmark, KCDP achieves a 12% absolute accuracy gain over the state-of-the-art TGCA-PVT, demonstrating significantly enhanced cross-domain generalization capability and emotion perception fidelity.

Technology Category

Application Category

📝 Abstract

Visual Emotion Recognition (VER) is a critical yet challenging task aimed at inferring emotional states of individuals based on visual cues. However, existing works focus on single domains, e.g., realistic images or stickers, limiting VER models' cross-domain generalizability. To fill this gap, we introduce an Unsupervised Cross-Domain Visual Emotion Recognition (UCDVER) task, which aims to generalize visual emotion recognition from the source domain (e.g., realistic images) to the low-resource target domain (e.g., stickers) in an unsupervised manner. Compared to the conventional unsupervised domain adaptation problems, UCDVER presents two key challenges: a significant emotional expression variability and an affective distribution shift. To mitigate these issues, we propose the Knowledge-aligned Counterfactual-enhancement Diffusion Perception (KCDP) framework. Specifically, KCDP leverages a VLM to align emotional representations in a shared knowledge space and guides diffusion models for improved visual affective perception. Furthermore, a Counterfactual-Enhanced Language-image Emotional Alignment (CLIEA) method generates high-quality pseudo-labels for the target domain. Extensive experiments demonstrate that our model surpasses SOTA models in both perceptibility and generalization, e.g., gaining 12% improvements over the SOTA VER model TGCA-PVT. The project page is at https://yinwen2019.github.io/ucdver.

Problem

Research questions and friction points this paper is trying to address.

Enhancing cross-domain visual emotion recognition without supervision

Addressing emotional expression variability and distribution shifts

Improving generalization from source to low-resource target domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM aligns emotional representations in shared space

Diffusion models enhance visual affective perception

Counterfactual method generates high-quality pseudo-labels

🔎 Similar Papers

Make Me Happier: Evoking Emotions Through Image Diffusion Models

2024-03-13arXiv.orgCitations: 3

UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception

2024-09-27arXiv.orgCitations: 1

Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions

2020-11-17Citations: 1

Authors to Follow