Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address the poor generalization of audio deepfake detection models (e.g., Wav2Vec2) fine-tuned on fixed training sets to unseen forgery methods, this paper proposes Mixture-of-LoRA Experts (MoLoRA). MoLoRA embeds multiple low-rank adapters into attention layers and employs a dynamic routing mechanism to selectively activate task-specialized experts, enabling adaptive modeling of novel forgery patterns. The backbone parameters remain entirely frozen, ensuring computational efficiency and scalability. Experiments demonstrate that MoLoRA significantly outperforms standard fine-tuning in both in-domain and out-of-domain settings. Specifically, the best-performing model reduces the average out-of-domain equal error rate (EER) from 8.55% to 6.08%, substantially enhancing robustness against previously unseen attacks. This improvement underscores MoLoRA’s effectiveness in mitigating domain shift and improving zero-shot generalization for audio deepfake detection.

Technology Category

Application Category

📝 Abstract

Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts, enhancing adaptability to evolving deepfake attacks. Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios, reducing equal error rates relative to baseline models. Notably, our best MoE-LoRA model lowers the average out-of-domain EER from 8.55% to 6.08%, demonstrating its effectiveness in achieving generalizable audio deepfake detection.

Problem

Research questions and friction points this paper is trying to address.

Addressing poor generalization to novel audio deepfake methods

Enhancing adaptability to evolving deepfake attack techniques

Improving out-of-domain detection performance for audio deepfakes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of LoRA experts integration

Routing mechanism activates specialized experts

Reduces error rates in generalization scenarios

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey