EmoSign: A Multimodal Dataset for Understanding Emotions in American Sign Language

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing sign language emotion recognition research is severely limited due to the dual functional role of facial expressions and manual signs—both encode grammatical structure and emotional content—leading to feature coupling and annotation ambiguity. Method: We introduce the first fine-grained annotated American Sign Language (ASL) emotion video dataset, comprising 200 multimodal samples, with collaborative labeling by three Deaf interpreters across six discrete emotions, sentiment polarity, and open-ended emotional cue descriptions. We systematically disentangle syntax- and emotion-shared facial and manual features, establishing the first ASL emotion recognition benchmark. Contribution/Results: We propose a ViT+LSTM+CLIP multimodal fusion baseline, achieving 58.3% accuracy on emotion classification—substantially outperforming random chance. The dataset is publicly released on Hugging Face, addressing a critical gap in sign language affective computing.

Technology Category

Application Category

📝 Abstract

Unlike spoken languages where the use of prosodic features to convey emotion is well studied, indicators of emotion in sign language remain poorly understood, creating communication barriers in critical settings. Sign languages present unique challenges as facial expressions and hand movements simultaneously serve both grammatical and emotional functions. To address this gap, we introduce EmoSign, the first sign video dataset containing sentiment and emotion labels for 200 American Sign Language (ASL) videos. We also collect open-ended descriptions of emotion cues. Annotations were done by 3 Deaf ASL signers with professional interpretation experience. Alongside the annotations, we include baseline models for sentiment and emotion classification. This dataset not only addresses a critical gap in existing sign language research but also establishes a new benchmark for understanding model capabilities in multimodal emotion recognition for sign languages. The dataset is made available at https://huggingface.co/datasets/catfang/emosign.

Problem

Research questions and friction points this paper is trying to address.

Understanding emotional indicators in American Sign Language

Differentiating emotional and grammatical functions in sign language

Lack of labeled datasets for sign language emotion recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

First ASL video dataset with emotion labels

Annotations by Deaf ASL professionals

Baseline models for emotion classification

🔎 Similar Papers

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale