π€ AI Summary
This work addresses the scarcity of spontaneous emotional speech data in naturalistic settings, as most existing datasets rely on acted or laboratory-induced emotions. To bridge this gap, the authors construct a novel multimodal dataset comprising spontaneous emotional speech elicited by real-time competitive outcomes, uniquely synchronized with fine-grained micro-gesture annotations captured in authentic contexts. The dataset includes transcribed utterances, speaker-role disentanglement, and word-level alignment, enabling comprehensive multimodal analysis. Leveraging pretrained acoustic and language models, the study establishes a dual-modality (speech and text) benchmark for emotion recognition. Experimental results demonstrate the datasetβs effectiveness and distinctive value in capturing genuine, spontaneous affective states, offering a valuable resource for advancing research in naturalistic emotion modeling.
π Abstract
This work presents iMiGUE-Speech, an extension of the iMiGUE dataset that provides a spontaneous affective corpus for studying emotional and affective states. The new release focuses on speech and enriches the original dataset with additional metadata, including speech transcripts, speaker-role separation between interviewer and interviewee, and word-level forced alignments. Unlike existing emotional speech datasets that rely on acted or laboratory-elicited emotions, iMiGUE-Speech captures spontaneous affect arising naturally from real match outcomes. To demonstrate the utility of the dataset and establish initial benchmarks, we introduce two evaluation tasks for comparative assessment: speech emotion recognition and transcript-based sentiment analysis. These tasks leverage state-of-the-art pre-trained representations to assess the dataset's ability to capture spontaneous affective states from both acoustic and linguistic modalities. iMiGUE-Speech can also be synchronously paired with micro-gesture annotations from the original iMiGUE dataset, forming a uniquely multimodal resource for studying speech-gesture affective dynamics. The extended dataset is available at https://github.com/CV-AC/imigue-speech.