Deepfake Detection of Singing Voices With Whisper Encodings

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The proliferation of synthetic singing voice deepfakes in the music industry poses significant challenges for authenticating vocal content. Method: This paper proposes a deepfake detection method leveraging noise-variant features extracted from Whisper encoders. Departing from conventional approaches that exploit Whisper’s robustness, we first identify and harness its sensitivity to noise—specifically, forged singing voices induce distinctive, scale-dependent (tiny/base/small/medium) encoding variations across Whisper models. These variations are formalized as discriminative features. We further integrate CNN and ResNet34 architectures to jointly model both dry (unmixed) and mixed audio scenarios. Results: Extensive experiments demonstrate that our method achieves significantly lower equal error rates (EER) compared to state-of-the-art baselines, validating the effectiveness and generalizability of noise-variant encoding features for singing voice deepfake detection.

Technology Category

Application Category

📝 Abstract
The deepfake generation of singing vocals is a concerning issue for artists in the music industry. In this work, we propose a singing voice deepfake detection (SVDD) system, which uses noise-variant encodings of open-AI's Whisper model. As counter-intuitive as it may sound, even though the Whisper model is known to be noise-robust, the encodings are rich in non-speech information, and are noise-variant. This leads us to evaluate Whisper encodings as feature representations for the SVDD task. Therefore, in this work, the SVDD task is performed on vocals and mixtures, and the performance is evaluated in %EER over varying Whisper model sizes and two classifiers- CNN and ResNet34, under different testing conditions.
Problem

Research questions and friction points this paper is trying to address.

Deepfake Detection
Singing Voice
Audio Authentication
Innovation

Methods, ideas, or system contributions that make the work stand out.

SVDD System
Whisper Model
Forgery Detection
🔎 Similar Papers
No similar papers found.
F
Falguni Sharma
Centre for Artificial Intelligence, Banasthali Vidyapith, Jaipur, India
Priyanka Gupta
Priyanka Gupta
Scientist, TCS Research
Dynamic PersonalizationMachine LearningDeep LearningTime Series