EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current sign language generation methods prioritize semantic accuracy while neglecting emotional expressiveness, resulting in synthesized videos with limited naturalness and vitality. To address this, we propose a multi-emotion-guided semantic disentanglement framework that explicitly separates emotion- and semantics-related features via a dedicated disentanglement module and employs a progressive training strategy for fine-grained emotion integration. In pose generation, we model confidence scores across seven distinct emotion categories and leverage a diffusion-based decoder to synthesize high-fidelity sign language pose sequences. To our knowledge, this is the first work to jointly incorporate explicit emotion control and semantic disentanglement into a text-to-sign generation pipeline. Evaluated on mainstream benchmarks, our method achieves significantly higher pose accuracy than all baselines while producing videos that simultaneously preserve semantic correctness and rich emotional expressiveness—thereby enhancing natural communication experiences for deaf users.

Technology Category

Application Category

📝 Abstract

Large language models have revolutionized sign language generation by automatically transforming text into high-quality sign language videos, providing accessible communication for the Deaf community. However, existing LLM-based approaches prioritize semantic accuracy while overlooking emotional expressions, resulting in outputs that lack naturalness and expressiveness. We propose EASL (Emotion-Aware Sign Language), a multi-emotion-guided generation architecture for fine-grained emotional integration. We introduce emotion-semantic disentanglement modules with progressive training to separately extract semantic and affective features. During pose decoding, the emotional representations guide semantic interaction to generate sign poses with 7-class emotion confidence scores, enabling emotional expression recognition. Experimental results demonstrate that EASL achieves pose accuracy superior to all compared baselines by integrating multi-emotion information and effectively adapts to diffusion models to generate expressive sign language videos.

Problem

Research questions and friction points this paper is trying to address.

Overlooks emotional expressions in sign language generation

Lacks naturalness and expressiveness in generated outputs

Needs fine-grained emotional integration for expressive videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-emotion guided semantic disentanglement architecture

Progressive training for separate semantic and affective feature extraction

Emotional representation guided pose decoding with confidence scores

🔎 Similar Papers

No similar papers found.

Authors to Follow