The Whole Is Bigger Than the Sum of Its Parts: Modeling Individual Annotators to Capture Emotional Variability

📅 2024-08-21
🏛️ Interspeech
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
In speech emotion recognition, the subjectivity and inter-annotator variability inherent in multi-annotator labels are often obscured by simplistic label averaging, leading to distorted modeling of emotional dynamics. To address this, we propose an end-to-end multitask framework that, for the first time, jointly predicts individual annotator identity and continuous emotion distributions—e.g., kernel density estimates or parametric distributions—during training. This explicitly models annotator behavioral heterogeneity while preserving population-level variability. Our approach eliminates label averaging and instead integrates annotator modeling directly into the distribution learning process, enabling co-optimization of annotator-specific characteristics and emotion distributions. Evaluated under both cross-corpus and in-corpus settings, our method achieves statistically significant improvements over state-of-the-art approaches in emotion distribution prediction accuracy. It more faithfully captures emotion subjectivity and annotator disagreement, offering a principled solution to modeling annotation uncertainty in affective computing.

Technology Category

Application Category

📝 Abstract
Emotion expression and perception are nuanced, complex, and highly subjective processes. When multiple annotators label emotional data, the resulting labels contain high variability. Most speech emotion recognition tasks address this by averaging annotator labels as ground truth. However, this process omits the nuance of emotion and inter-annotator variability, which are important signals to capture. Previous work has attempted to learn distributions to capture emotion variability, but these methods also lose information about the individual annotators. We address these limitations by learning to predict individual annotators and by introducing a novel method to create distributions from continuous model outputs that permit the learning of emotion distributions during model training. We show that this combined approach can result in emotion distributions that are more accurate than those seen in prior work, in both within- and cross-corpus settings.
Problem

Research questions and friction points this paper is trying to address.

Modeling individual annotators to capture emotional variability
Predicting individual annotators to preserve annotation nuance
Creating distributions from continuous outputs for emotion recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predict individual annotators' emotional labels
Create distributions from continuous model outputs
Learn emotion distributions during model training
🔎 Similar Papers
No similar papers found.
J
James Tavernor
University of Michigan, USA
Y
Yara El-Tawil
University of Michigan, USA
E
E. Provost
University of Michigan, USA