Replay Attacks Against Audio Deepfake Detection

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work exposes the severe threat posed by replay attacks—where synthetic speech is played through loudspeakers and re-recorded via microphones—to deepfake audio detection. Such physical-layer attacks significantly degrade detector performance, causing false acceptance of forged speech as genuine. To address this gap, the authors introduce ReplayDF, the first systematic, cross-lingual, multi-device, and multi-TTS benchmark dataset for replay attacks, constructed from M-AILABS and MLAAD and incorporating realistic acoustic channel distortions. Evaluation across six state-of-the-art detectors—including W2V2-AASIST—reveals that replay attacks increase W2V2-AASIST’s equal error rate (EER) from 4.7% to 18.2%. Even after adaptive retraining with room impulse responses (RIRs), EER remains as high as 11.0%, demonstrating the fundamental vulnerability of current methods in real-world settings. This work establishes a critical benchmark and provides empirical evidence to advance robust audio deepfake detection.

Technology Category

Application Category

📝 Abstract

We show how replay attacks undermine audio deepfake detection: By playing and re-recording deepfake audio through various speakers and microphones, we make spoofed samples appear authentic to the detection model. To study this phenomenon in more detail, we introduce ReplayDF, a dataset of recordings derived from M-AILABS and MLAAD, featuring 109 speaker-microphone combinations across six languages and four TTS models. It includes diverse acoustic conditions, some highly challenging for detection. Our analysis of six open-source detection models across five datasets reveals significant vulnerability, with the top-performing W2V2-AASIST model's Equal Error Rate (EER) surging from 4.7% to 18.2%. Even with adaptive Room Impulse Response (RIR) retraining, performance remains compromised with an 11.0% EER. We release ReplayDF for non-commercial research use.

Problem

Research questions and friction points this paper is trying to address.

Replay attacks deceive audio deepfake detection systems

Diverse acoustic conditions challenge detection model accuracy

Existing models show vulnerability despite adaptive retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReplayDF dataset with diverse speaker-microphone combinations

Analysis of six detection models across five datasets

Room Impulse Response retraining for improved performance

🔎 Similar Papers

No similar papers found.

Authors to Follow