EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current sign language generation methods prioritize semantic accuracy while neglecting emotional expressiveness, resulting in synthesized videos with limited naturalness and vitality. To address this, we propose a multi-emotion-guided semantic disentanglement framework that explicitly separates emotion- and semantics-related features via a dedicated disentanglement module and employs a progressive training strategy for fine-grained emotion integration. In pose generation, we model confidence scores across seven distinct emotion categories and leverage a diffusion-based decoder to synthesize high-fidelity sign language pose sequences. To our knowledge, this is the first work to jointly incorporate explicit emotion control and semantic disentanglement into a text-to-sign generation pipeline. Evaluated on mainstream benchmarks, our method achieves significantly higher pose accuracy than all baselines while producing videos that simultaneously preserve semantic correctness and rich emotional expressiveness—thereby enhancing natural communication experiences for deaf users.

Technology Category

Application Category

📝 Abstract
Large language models have revolutionized sign language generation by automatically transforming text into high-quality sign language videos, providing accessible communication for the Deaf community. However, existing LLM-based approaches prioritize semantic accuracy while overlooking emotional expressions, resulting in outputs that lack naturalness and expressiveness. We propose EASL (Emotion-Aware Sign Language), a multi-emotion-guided generation architecture for fine-grained emotional integration. We introduce emotion-semantic disentanglement modules with progressive training to separately extract semantic and affective features. During pose decoding, the emotional representations guide semantic interaction to generate sign poses with 7-class emotion confidence scores, enabling emotional expression recognition. Experimental results demonstrate that EASL achieves pose accuracy superior to all compared baselines by integrating multi-emotion information and effectively adapts to diffusion models to generate expressive sign language videos.
Problem

Research questions and friction points this paper is trying to address.

Overlooks emotional expressions in sign language generation
Lacks naturalness and expressiveness in generated outputs
Needs fine-grained emotional integration for expressive videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-emotion guided semantic disentanglement architecture
Progressive training for separate semantic and affective feature extraction
Emotional representation guided pose decoding with confidence scores
🔎 Similar Papers
No similar papers found.
Yanchao Zhao
Yanchao Zhao
Nanjing University of Aeronautics and Astronautics
Computer Networks
J
Jihao Zhu
School of Language, Literature, Music and Visual Culture, The University of Aberdeen
Y
Yu Liu
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
W
Weizhuo Chen
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
Y
Yuling Yang
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
K
Kun Peng
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences