Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Arabic Dialect Identification (ADI) suffers from poor cross-domain generalization, primarily due to speaker bias and domain shift. To address this, we propose the first speaker-decoupled ADI framework leveraging voice conversion (VC), jointly optimizing end-to-end dialect classification and cross-domain robust training. Our method explicitly disentangles speaker identity from dialect-relevant features via VC-based representation learning, thereby mitigating speaker bias and enhancing generalization to unseen recording domains. We further introduce the first real-world multi-domain ADI benchmark—comprising four newly collected, naturally diverse domains—to rigorously evaluate cross-domain robustness. Extensive experiments demonstrate that our approach achieves up to a 34.1% absolute accuracy improvement over prior methods in cross-domain evaluation, establishing new state-of-the-art performance. All code, models, and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Arabic dialect identification (ADI) systems are essential for large-scale data collection pipelines that enable the development of inclusive speech technologies for Arabic language varieties. However, the reliability of current ADI systems is limited by poor generalization to out-of-domain speech. In this paper, we present an effective approach based on voice conversion for training ADI models that achieves state-of-the-art performance and significantly improves robustness in cross-domain scenarios. Evaluated on a newly collected real-world test set spanning four different domains, our approach yields consistent improvements of up to +34.1% in accuracy across domains. Furthermore, we present an analysis of our approach and demonstrate that voice conversion helps mitigate the speaker bias in the ADI dataset. We release our robust ADI model and cross-domain evaluation dataset to support the development of inclusive speech technologies for Arabic.
Problem

Research questions and friction points this paper is trying to address.

Improves cross-domain robustness for Arabic dialect identification
Addresses poor generalization of current ADI systems
Mitigates speaker bias in ADI datasets using voice conversion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voice conversion enhances cross-domain ADI robustness
State-of-the-art performance achieved with voice conversion
Voice conversion reduces speaker bias in ADI
🔎 Similar Papers
No similar papers found.
B
Badr M. Abdullah
Language Science and Technology, Saarland University, Germany
Matthew Baas
Matthew Baas
PhD student, Stellenbosch university
speech synthesis
B
Bernd Mobius
Language Science and Technology, Saarland University, Germany
Dietrich Klakow
Dietrich Klakow
Saarland University, Saarland Informatics Campus, PharmaScienceHub
Natural Language ProcessingSpeech ProcessingQuestion AnsweringMachine Learning