🤖 AI Summary
This study addresses the critical gap in culturally aware emotion recognition within conversational AI, particularly the underrepresentation of African Black communities, which undermines ethical performance and system trustworthiness. To bridge this gap, the work proposes a novel multimodal approach that integrates vocal and facial image data, introducing an innovative Audio-Frame Mean Expression (AFME) algorithm. The model employs a three-layer convolutional neural network to simultaneously recognize seven basic emotions and detect sarcasm, explicitly accounting for cultural, regional, and contextual nuances. Experimental results demonstrate strong performance, achieving accuracy rates between 85% and 96% across all tasks, thereby significantly enhancing the adaptability, precision, and reliability of conversational AI systems in this specific cultural context.
📝 Abstract
Valuable decisions and highly prioritized analysis now depend on applications such as facial biometrics, social media photo tagging, and human robots interactions. However, the ability to successfully deploy such applications is based on their efficiencies on tested use cases taking into consideration possible edge cases. Over the years, lots of generalized solutions have been implemented to mimic human emotions including sarcasm. However, factors such as geographical location or cultural difference have not been explored fully amidst its relevance in resolving ethical issues and improving conversational AI (Artificial Intelligence). In this paper, we seek to address the potential challenges in the usage of conversational AI within Black African society. We develop an emotion prediction model with accuracies ranging between 85% and 96%. Our model combines both speech and image data to detect the seven basic emotions with a focus on also identifying sarcasm. It uses 3-layers of the Convolutional Neural Network in addition to a new Audio-Frame Mean Expression (AFME) algorithm and focuses on model pre-processing and post-processing stages. In the end, our proposed solution contributes to maintaining the credibility of an emotion recognition system in conversational AIs.