Replay Attacks Against Audio Deepfake Detection

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes the severe threat posed by replay attacks—where synthetic speech is played through loudspeakers and re-recorded via microphones—to deepfake audio detection. Such physical-layer attacks significantly degrade detector performance, causing false acceptance of forged speech as genuine. To address this gap, the authors introduce ReplayDF, the first systematic, cross-lingual, multi-device, and multi-TTS benchmark dataset for replay attacks, constructed from M-AILABS and MLAAD and incorporating realistic acoustic channel distortions. Evaluation across six state-of-the-art detectors—including W2V2-AASIST—reveals that replay attacks increase W2V2-AASIST’s equal error rate (EER) from 4.7% to 18.2%. Even after adaptive retraining with room impulse responses (RIRs), EER remains as high as 11.0%, demonstrating the fundamental vulnerability of current methods in real-world settings. This work establishes a critical benchmark and provides empirical evidence to advance robust audio deepfake detection.

Technology Category

Application Category

📝 Abstract
We show how replay attacks undermine audio deepfake detection: By playing and re-recording deepfake audio through various speakers and microphones, we make spoofed samples appear authentic to the detection model. To study this phenomenon in more detail, we introduce ReplayDF, a dataset of recordings derived from M-AILABS and MLAAD, featuring 109 speaker-microphone combinations across six languages and four TTS models. It includes diverse acoustic conditions, some highly challenging for detection. Our analysis of six open-source detection models across five datasets reveals significant vulnerability, with the top-performing W2V2-AASIST model's Equal Error Rate (EER) surging from 4.7% to 18.2%. Even with adaptive Room Impulse Response (RIR) retraining, performance remains compromised with an 11.0% EER. We release ReplayDF for non-commercial research use.
Problem

Research questions and friction points this paper is trying to address.

Replay attacks deceive audio deepfake detection systems
Diverse acoustic conditions challenge detection model accuracy
Existing models show vulnerability despite adaptive retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

ReplayDF dataset with diverse speaker-microphone combinations
Analysis of six detection models across five datasets
Room Impulse Response retraining for improved performance
🔎 Similar Papers
No similar papers found.
N
Nicolas Muller
Fraunhofer AISEC, Germany
Piotr Kawa
Piotr Kawa
Wrocław University of Science and Technology
DeepFake detectionSpeech processingImage processingMachine learning
W
Wei-Herng Choong
Fraunhofer AISEC, Germany
A
Adriana Stan
Technical University of Cluj-Napoca, Romania
A
Aditya Tirumala Bukkapatnam
Resemble AI, USA
Karla Pizzi
Karla Pizzi
Technical University Munich
Alexander Wagner
Alexander Wagner
Fraunhofer AISEC, Germany
Philip Sperl
Philip Sperl
Fraunhofer AISEC
IT Security