AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

📅 2025-11-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A lack of publicly available, Africa-centric automatic speech recognition (ASR) evaluation benchmarks hinders fair, cross-accent and cross-national model assessment and deployment. Method: We introduce the first multi-domain, multi-national ASR benchmark covering spoken English from over 10 African countries, encompassing 100+ regional accents and seven application domains. Our vertically structured evaluation framework integrates spontaneous and read speech, open- and closed-source ASR systems, and multimodal large language models (LLMs), employing fine-grained error analysis and joint latency–accuracy evaluation. Contribution/Results: Experiments reveal that open-source ASR models excel on spontaneous speech but suffer from poor noise robustness; multimodal LLMs exhibit strong accent robustness yet frequently misrecognize proper nouns; fine-tuned models balance accuracy and low latency but still generate pervasive hallucinations. This work establishes a novel paradigm for low-resource accent adaptation and provides a foundational resource for equitable ASR development in linguistically diverse African contexts.

Technology Category

Application Category

📝 Abstract
Recent advances in speech-enabled AI, including Google's NotebookLM and OpenAI's speech-to-speech API, are driving widespread interest in voice interfaces globally. Despite this momentum, there exists no publicly available application-specific model evaluation that caters to Africa's linguistic diversity. We present AfriSpeech-MultiBench, the first domain-specific evaluation suite for over 100 African English accents across 10+ countries and seven application domains: Finance, Legal, Medical, General dialogue, Call Center, Named Entities and Hallucination Robustness. We benchmark a diverse range of open, closed, unimodal ASR and multimodal LLM-based speech recognition systems using both spontaneous and non-spontaneous speech conversation drawn from various open African accented English speech datasets. Our empirical analysis reveals systematic variation: open-source ASR models excels in spontaneous speech contexts but degrades on noisy, non-native dialogue; multimodal LLMs are more accent-robust yet struggle with domain-specific named entities; proprietary models deliver high accuracy on clean speech but vary significantly by country and domain. Models fine-tuned on African English achieve competitive accuracy with lower latency, a practical advantage for deployment, hallucinations still remain a big problem for most SOTA models. By releasing this comprehensive benchmark, we empower practitioners and researchers to select voice technologies suited to African use-cases, fostering inclusive voice applications for underserved communities.
Problem

Research questions and friction points this paper is trying to address.

Lack of specialized ASR evaluation for diverse African English accents
Need to assess model performance across multiple domains and countries
Addressing accuracy gaps in noisy, non-native speech recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific benchmark for African English accents
Multimodal LLMs evaluated for accent robustness
Fine-tuned models achieve competitive accuracy with low latency
🔎 Similar Papers
No similar papers found.