Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes

šŸ“… 2025-05-29
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Facing the growing challenge of detecting deepfakes generated by rapidly advancing TTS-based voice cloning systems, this paper proposes ADD-GP—a few-shot adaptive framework designed to efficiently adapt to unseen TTS generators using only a small number (even one) of authentic or synthetic samples. Methodologically, ADD-GP is the first to introduce Gaussian process classification into audio deepfake detection, integrating deep acoustic embeddings with meta-learning to enable personalized adaptation and cross-model generalization without requiring large-scale labeled data. Its key contributions are: (1) eliminating reliance on extensive annotated datasets typical of conventional supervised fine-tuning; (2) achieving >92% one-shot adaptation accuracy on a newly constructed, state-of-the-art voice cloning benchmark; and (3) significantly enhancing robustness and transferability to previously unseen TTS models.

Technology Category

Application Category

šŸ“ Abstract
Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to efficiently adapt to previously unseen generation models with minimal data. This paper introduces ADD-GP, a few-shot adaptive framework based on a Gaussian Process (GP) classifier for Audio Deepfake Detection (ADD). We show how the combination of a powerful deep embedding model with the Gaussian processes flexibility can achieve strong performance and adaptability. Additionally, we show this approach can also be used for personalized detection, with greater robustness to new TTS models and one-shot adaptability. To support our evaluation, a benchmark dataset is constructed for this task using new state-of-the-art voice cloning models.
Problem

Research questions and friction points this paper is trying to address.

Adapting deepfake detection to unseen TTS models with minimal data
Enhancing detection robustness against new voice cloning technologies
Achieving one-shot adaptability for personalized deepfake detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Process classifier for deepfake detection
Few-shot adaptation with deep embedding
Personalized detection with one-shot adaptability
šŸ”Ž Similar Papers