EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing emotional corpora are predominantly monolingual and single-label, limiting their ability to model mixed emotions and code-switching in authentic contexts, thereby compromising ecological validity and cross-lingual generalizability. To address this, we introduce the first multilingual mixed-emotion speech corpus—covering English, Mandarin, and Cantonese—with fine-grained, multi-label emotion annotations and label distribution learning to capture emotional continuity. The corpus is built from spontaneous online speech recordings and rigorously annotated. We conduct speaker-independent benchmark experiments using self-supervised models (e.g., HuBERT-large-EN), demonstrating robust performance across gender, age, and personality subgroups. Both the corpus and baseline code are publicly released. This work significantly advances adaptability and empathic modeling capabilities in affective computing systems.

Technology Category

Application Category

📝 Abstract
This study introduces EM2LDL, a novel multilingual speech corpus designed to advance mixed emotion recognition through label distribution learning. Addressing the limitations of predominantly monolingual and single-label emotion corpora extcolor{black}{that restrict linguistic diversity, are unable to model mixed emotions, and lack ecological validity}, EM2LDL comprises expressive utterances in English, Mandarin, and Cantonese, capturing the intra-utterance code-switching prevalent in multilingual regions like Hong Kong and Macao. The corpus integrates spontaneous emotional expressions from online platforms, annotated with fine-grained emotion distributions across 32 categories. Experimental baselines using self-supervised learning models demonstrate robust performance in speaker-independent gender-, age-, and personality-based evaluations, with HuBERT-large-EN achieving optimal results. By incorporating linguistic diversity and ecological validity, EM2LDL enables the exploration of complex emotional dynamics in multilingual settings. This work provides a versatile testbed for developing adaptive, empathetic systems for applications in affective computing, including mental health monitoring and cross-cultural communication. The dataset, annotations, and baseline codes are publicly available at https://github.com/xingfengli/EM2LDL.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of monolingual single-label emotion corpora
Enables mixed emotion recognition through label distribution learning
Captures multilingual emotional dynamics with ecological validity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual speech corpus for mixed emotion recognition
Label distribution learning with 32 emotion categories
Self-supervised models achieving robust cross-demographic performance
🔎 Similar Papers
No similar papers found.
X
Xingfeng Li
Faculty of Data Science, City University of Macau, Macau 999078, China
X
Xiaohan Shi
Graduate School of Informatics, Nagoya University, Nagoya 464-8601, Japan
J
Junjie Li
Faculty of Data Science, City University of Macau, Macau 999078, China
Y
Yongwei Li
Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
Masashi Unoki
Masashi Unoki
Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan
Tomoki Toda
Tomoki Toda
Nagoya University
Signal ProcessingSpeech ProcessingSpeech Synthesis
Masato Akagi
Masato Akagi
Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi 923-1292, Japan