Deepfake Detection of Singing Voices With Whisper Encodings

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

291K/year

🤖 AI Summary

The proliferation of synthetic singing voice deepfakes in the music industry poses significant challenges for authenticating vocal content. Method: This paper proposes a deepfake detection method leveraging noise-variant features extracted from Whisper encoders. Departing from conventional approaches that exploit Whisper’s robustness, we first identify and harness its sensitivity to noise—specifically, forged singing voices induce distinctive, scale-dependent (tiny/base/small/medium) encoding variations across Whisper models. These variations are formalized as discriminative features. We further integrate CNN and ResNet34 architectures to jointly model both dry (unmixed) and mixed audio scenarios. Results: Extensive experiments demonstrate that our method achieves significantly lower equal error rates (EER) compared to state-of-the-art baselines, validating the effectiveness and generalizability of noise-variant encoding features for singing voice deepfake detection.

Technology Category

Application Category

📝 Abstract

The deepfake generation of singing vocals is a concerning issue for artists in the music industry. In this work, we propose a singing voice deepfake detection (SVDD) system, which uses noise-variant encodings of open-AI's Whisper model. As counter-intuitive as it may sound, even though the Whisper model is known to be noise-robust, the encodings are rich in non-speech information, and are noise-variant. This leads us to evaluate Whisper encodings as feature representations for the SVDD task. Therefore, in this work, the SVDD task is performed on vocals and mixtures, and the performance is evaluated in %EER over varying Whisper model sizes and two classifiers- CNN and ResNet34, under different testing conditions.

Problem

Research questions and friction points this paper is trying to address.

Deepfake Detection

Singing Voice

Audio Authentication

Innovation

Methods, ideas, or system contributions that make the work stand out.

SVDD System

Whisper Model

Forgery Detection

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection