π€ AI Summary
To address the scarcity of multimodal sentiment dialogue datasets for low-resource African languages like Akan, this work introduces Akan-MEDβthe first multimodal emotion recognition dataset for Akan film dialogues. It comprises 385 dialogues and 6,162 audio-video-text samples, featuring the first-ever word-level prosodic prominence annotations. The dataset integrates human-annotated audio (prosodic features), visual (facial expressions), and textual (dialogue content) modalities, with cross-modal alignment and collaborative verification ensuring annotation quality. As the first prosodically annotated emotional resource for an African language, Akan-MED fills a critical gap in multimodal sentiment analysis for low-resource languages. We establish robust baselines on mainstream multimodal models, demonstrating its high quality and practical utility for advancing inclusive, culturally grounded NLP research.
π Abstract
In this paper, we introduce the Akan Conversation Emotion (ACE) dataset, the first multimodal emotion dialogue dataset for an African language, addressing the significant lack of resources for low-resource languages in emotion recognition research. ACE, developed for the Akan language, contains 385 emotion-labeled dialogues and 6,162 utterances across audio, visual, and textual modalities, along with word-level prosodic prominence annotations. The presence of prosodic labels in this dataset also makes it the first prosodically annotated African language dataset. We demonstrate the quality and utility of ACE through experiments using state-of-the-art emotion recognition methods, establishing solid baselines for future research. We hope ACE inspires further work on inclusive, linguistically and culturally diverse NLP resources.