Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing emotion recognition models heavily rely on facial regions, rendering them fragile under occlusions and other real-world conditions. To address this limitation, we propose ELENA, the first framework leveraging Large Vision-Language Models (LVLMs) to automatically identify salient bodily regions—beyond the face—from whole-body affective responses and generate multi-level, embodied emotion narratives. Our method integrates attention mapping with multi-granularity text generation, enabling end-to-end emotion reasoning without fine-tuning. Experiments demonstrate that ELENA effectively mitigates facial attention bias, significantly outperforming state-of-the-art baselines on emotion recognition from face-occluded images. Moreover, the generated narratives exhibit both interpretability and physiological plausibility. This work establishes a novel paradigm for embodied emotion modeling, shifting focus from isolated facial cues to holistic, body-grounded affective understanding.

Technology Category

Application Category

📝 Abstract

The embodiment of emotional reactions from body parts contains rich information about our affective experiences. We propose a framework that utilizes state-of-the-art large vision-language models (LVLMs) to generate Embodied LVLM Emotion Narratives (ELENA). These are well-defined, multi-layered text outputs, primarily comprising descriptions that focus on the salient body parts involved in emotional reactions. We also employ attention maps and observe that contemporary models exhibit a persistent bias towards the facial region. Despite this limitation, we observe that our employed framework can effectively recognize embodied emotions in face-masked images, outperforming baselines without any fine-tuning. ELENA opens a new trajectory for embodied emotion analysis across the modality of vision and enriches modeling in an affect-aware setting.

Problem

Research questions and friction points this paper is trying to address.

Generating embodied emotion narratives from body parts using vision-language models

Addressing facial bias in emotion recognition models through multi-layered analysis

Enabling emotion recognition in face-masked images without model fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

LVLMs generate embodied emotion narratives

Framework uses attention maps for bias analysis

Recognizes emotions effectively in face-masked images

🔎 Similar Papers

No similar papers found.

Authors to Follow