EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-image generation methods struggle to accurately model complex, continuous emotional variations and lack fine-grained control over emotional intensity—specifically along the valence-arousal (V-A) dimensions. This paper introduces the Continuous Emotional Image Generation (C-EICG) task, the first to jointly leverage V-A space and textual prompts for image synthesis. Methodologically, we design a V-A emotion embedding mapping network to achieve nonlinear alignment between emotional dimensions and textual semantics, and incorporate an emotion-consistency contrastive loss alongside a reconstruction loss for joint optimization. Built upon a diffusion model framework, our approach significantly outperforms discrete-emotion baselines across multiple benchmarks: FID improves by 18.3% (indicating enhanced emotional fidelity), CLIP-Score rises by 12.7% (reflecting superior content preservation), and precise V-A controllable generation is achieved—establishing new state-of-the-art performance in emotional image generation.

Technology Category

Application Category

📝 Abstract
Recent research shows that emotions can enhance users' cognition and influence information communication. While research on visual emotion analysis is extensive, limited work has been done on helping users generate emotionally rich image content. Existing work on emotional image generation relies on discrete emotion categories, making it challenging to capture complex and subtle emotional nuances accurately. Additionally, these methods struggle to control the specific content of generated images based on text prompts. In this work, we introduce the new task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values. Specifically, we propose a novel emotion-embedding mapping network that embeds Valence-Arousal values into textual features, enabling the capture of specific emotions in alignment with intended input prompts. Additionally, we introduce a loss function to enhance emotion expression. The experimental results show that our method effectively generates images representing specific emotions with the desired content and outperforms existing techniques.
Problem

Research questions and friction points this paper is trying to address.

Emotion Expression
Image Generation
Text-to-Image Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

EmotiCrafter
Affective Image Generation
Emotional Intensity Integration
🔎 Similar Papers
Y
Yi He
iDVX Lab, Tongji University, Shanghai, China
S
Shengqi Dang
iDVX Lab, Tongji University, Shanghai, China
Long Ling
Long Ling
Tongji University.
Human AI InteractionHCIDigital Fabrication
Z
Ziqing Qian
iDVX Lab, Tongji University, Shanghai, China
N
Nanxuan Zhao
Adobe Research, California, USA
Nan Cao
Nan Cao
Professor, Intelligent Big Data Visualization Lab @ Tongji University
Visual AnalyticsInformation VisualizationVisualizationHuman-Computer Interaction