Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing emotion recognition models heavily rely on facial regions, rendering them fragile under occlusions and other real-world conditions. To address this limitation, we propose ELENA, the first framework leveraging Large Vision-Language Models (LVLMs) to automatically identify salient bodily regions—beyond the face—from whole-body affective responses and generate multi-level, embodied emotion narratives. Our method integrates attention mapping with multi-granularity text generation, enabling end-to-end emotion reasoning without fine-tuning. Experiments demonstrate that ELENA effectively mitigates facial attention bias, significantly outperforming state-of-the-art baselines on emotion recognition from face-occluded images. Moreover, the generated narratives exhibit both interpretability and physiological plausibility. This work establishes a novel paradigm for embodied emotion modeling, shifting focus from isolated facial cues to holistic, body-grounded affective understanding.

Technology Category

Application Category

📝 Abstract
The embodiment of emotional reactions from body parts contains rich information about our affective experiences. We propose a framework that utilizes state-of-the-art large vision-language models (LVLMs) to generate Embodied LVLM Emotion Narratives (ELENA). These are well-defined, multi-layered text outputs, primarily comprising descriptions that focus on the salient body parts involved in emotional reactions. We also employ attention maps and observe that contemporary models exhibit a persistent bias towards the facial region. Despite this limitation, we observe that our employed framework can effectively recognize embodied emotions in face-masked images, outperforming baselines without any fine-tuning. ELENA opens a new trajectory for embodied emotion analysis across the modality of vision and enriches modeling in an affect-aware setting.
Problem

Research questions and friction points this paper is trying to address.

Generating embodied emotion narratives from body parts using vision-language models
Addressing facial bias in emotion recognition models through multi-layered analysis
Enabling emotion recognition in face-masked images without model fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LVLMs generate embodied emotion narratives
Framework uses attention maps for bias analysis
Recognizes emotions effectively in face-masked images
🔎 Similar Papers
No similar papers found.
M
Mohammad Saim
University of Cincinnati
Phan Anh Duong
Phan Anh Duong
University of Cincinnati
natural language processing
Cat Luong
Cat Luong
University of Cincinnati
Natural Language ProcessingMachine LearningDeep Learning
A
Aniket Bhanderi
University of Cincinnati
T
Tianyu Jiang
University of Cincinnati