Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Facing the growing challenge of detecting deepfakes generated by rapidly advancing TTS-based voice cloning systems, this paper proposes ADD-GP—a few-shot adaptive framework designed to efficiently adapt to unseen TTS generators using only a small number (even one) of authentic or synthetic samples. Methodologically, ADD-GP is the first to introduce Gaussian process classification into audio deepfake detection, integrating deep acoustic embeddings with meta-learning to enable personalized adaptation and cross-model generalization without requiring large-scale labeled data. Its key contributions are: (1) eliminating reliance on extensive annotated datasets typical of conventional supervised fine-tuning; (2) achieving >92% one-shot adaptation accuracy on a newly constructed, state-of-the-art voice cloning benchmark; and (3) significantly enhancing robustness and transferability to previously unseen TTS models.

Technology Category

Application Category

📝 Abstract

Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to efficiently adapt to previously unseen generation models with minimal data. This paper introduces ADD-GP, a few-shot adaptive framework based on a Gaussian Process (GP) classifier for Audio Deepfake Detection (ADD). We show how the combination of a powerful deep embedding model with the Gaussian processes flexibility can achieve strong performance and adaptability. Additionally, we show this approach can also be used for personalized detection, with greater robustness to new TTS models and one-shot adaptability. To support our evaluation, a benchmark dataset is constructed for this task using new state-of-the-art voice cloning models.

Problem

Research questions and friction points this paper is trying to address.

Adapting deepfake detection to unseen TTS models with minimal data

Enhancing detection robustness against new voice cloning technologies

Achieving one-shot adaptability for personalized deepfake detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Process classifier for deepfake detection

Few-shot adaptation with deep embedding

Personalized detection with one-shot adaptability

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection