MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in speech-based emotion recognition research—the scarcity of large-scale, naturally occurring datasets annotated with time-continuous emotional labels. To bridge this gap, the authors introduce a high-quality corpus comprising over 70 hours of podcast dialogues, featuring the first time-continuous annotations of valence, arousal, and dominance dimensions within natural conversational contexts, complemented by fine-grained speaker diarization. The data collection pipeline and manual annotation protocol are specifically designed to be compatible with deep learning approaches, enabling robust modeling of dynamic emotional trajectories. Baseline experiments demonstrate the dataset’s effectiveness in capturing context-dependent emotional expressions, substantially advancing the feasibility and performance of emotion recognition systems in real-world scenarios.

Technology Category

Application Category

📝 Abstract
Affective computing aims to understand and model human emotions for computational systems. Within this field, speech emotion recognition (SER) focuses on predicting emotions conveyed through speech. While early SER systems relied on limited datasets and traditional machine learning models, recent deep learning approaches demand largescale, naturalistic emotional corpora. To address this need, we introduce the MSP-Conversation corpus: a dataset of more than 70 hours of conversational audio with time-continuous emotional annotations and detailed speaker diarizations. The time-continuous annotations capture the dynamic and contextdependent nature of emotional expression. The annotations in the corpus include fine-grained temporal traces of valence, arousal, and dominance. The audio data is sourced from publicly available podcasts and overlaps with a subset of the isolated speaking turns in the MSP-Podcast corpus to facilitate direct comparisons between annotation methods (i.e., in-context versus out-of-context annotations). The paper outlines the development of the corpus, annotation methodology, analyses of the annotations, and baseline SER experiments, establishing the MSP-Conversation corpus as a valuable resource for advancing research in dynamic SER in naturalistic settings.
Problem

Research questions and friction points this paper is trying to address.

speech emotion recognition
naturalistic corpus
time-continuous annotation
emotional dynamics
conversational audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

time-continuous emotion annotation
naturalistic conversational corpus
speech emotion recognition
context-dependent emotion modeling
speaker diarization
🔎 Similar Papers
No similar papers found.
Luz Martinez-Lucas
Luz Martinez-Lucas
University of Texas at Dallas
Machine LearningSpeech Emotion RecognitionArtificial IntelligenceSpeech Processing
P
Pravin Mote
Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX 75080 USA and Language Technologies Institute, Carnegie Mellon University, Pittsburgh PA-15213 USA
Abinay Reddy Naini
Abinay Reddy Naini
Visiting PhD Candidate of Language Technologies Institute - Carnegie Mellon University
Affective ComputingMachine LearningSpeechMultimodal signal processing
M
Mohammed Abdelwahab
AT&T Labs Research, Bedminster, NJ USA
C
Carlos Busso
Language Technologies Institute, Carnegie Mellon University, Pittsburgh PA-15213 USA