MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

📅 2024-01-17

🏛️ IEEE International Joint Conference on Neural Network

📈 Citations: 34

✨ Influential: 3

career value

242K/year

🤖 AI Summary

Existing anti-spoofing audio detection systems suffer from poor generalization and cross-lingual robustness due to overreliance on English and Chinese data. Method: We introduce MLAAD—the first large-scale multilingual anti-spoofing audio dataset—comprising 160.2 hours of speech across 23 languages, synthesized using 52 TTS models (spanning 22 architectures). MLAAD bridges the critical gap in non-English/non-Chinese spoofing data. For the first time, we systematically overcome language bias in spoofing detection, demonstrating complementarity with ASVspoof 2019 and substantially improving cross-lingual detection performance. Contribution/Results: Extensive cross-dataset evaluation on ResNet, LCNN, and RawNet2 shows that models trained on MLAAD consistently outperform those trained on InTheWild and FakeOrReal across eight benchmarks; moreover, MLAAD-trained models achieve state-of-the-art results on four datasets, while ASVspoof 2019-trained models lead on the other four. This advances global applicability and fairness in deepfake audio detection.

Technology Category

Application Category

📝 Abstract

Text-to-Speech (TTS) technology brings significant advantages, such as giving a voice to those with speech impairments, but also enables audio deepfakes and spoofs. The former mislead individuals and may propagate misinformation, while the latter undermine voice biometric security systems. AI-based detection can help to address these challenges by automatically differentiating between genuine and fabricated voice recordings. However, these models are only as good as their training data, which currently is severely limited due to an overwhelming concentration on English and Chinese audio in anti-spoofing databases, thus restricting its worldwide effectiveness.In response, this paper presents the Multi-Language Audio Anti-Spoof Dataset (MLAAD), created using 52 TTS models, comprising 22 different architectures, to generate 160.2 hours of synthetic voice in 23 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD, and observe that MLAAD demonstrates superior performance over comparable datasets like InTheWild or FakeOrReal when used as a training resource. Furthermore, in comparison with the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, both excelling on four datasets.By publishing 1 MLAAD and making trained models accessible via an interactive webserver 2, we aim to democratize antispoofing technology, making it accessible beyond the realm of specialists, thus contributing to global efforts against audio spoofing and deepfakes.

Problem

Research questions and friction points this paper is trying to address.

Limited language diversity in current audio anti-spoofing datasets

Dependence of detection models on skewed training data

Need for globally applicable anti-spoofing solutions beyond English and Chinese

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-language dataset with 38 languages

Uses 91 TTS models for diverse data

Complements ASVspoof 2019 dataset performance

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey

2024-04-22arXiv.orgCitations: 25

Anthropic

$350,000—$500,000 USD

San Francisco, CA, USA

Research Scientist Intern, Multimodal AI (PhD)