Wavelet-Driven Generalizable Framework for Deepfake Face Forgery Detection

📅 2024-09-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
Deepfake face detection suffers from poor cross-dataset generalization and limited robustness against images generated by unseen diffusion models. Method: This paper proposes a unified detection framework integrating wavelet transform with CLIP-pretrained ViT-L/14 features. It pioneers the deep fusion of multi-scale wavelet frequency-domain analysis and vision-language aligned features to enable joint spatial-frequency modeling. The approach employs Haar and Daubechies wavelet decomposition, contrastive learning–driven cross-modal feature alignment, and cross-dataset transfer training, and is evaluated under zero-shot settings for robustness against unknown forgery sources. Contribution/Results: Experiments demonstrate state-of-the-art performance: average cross-dataset AUC of 0.749 and AUC of 0.893 on unseen diffusion-model-generated forgeries—surpassing all existing methods.

Technology Category

Application Category

📝 Abstract
The evolution of digital image manipulation, particularly with the advancement of deep generative models, significantly challenges existing deepfake detection methods, especially when the origin of the deepfake is obscure. To tackle the increasing complexity of these forgeries, we propose extbf{Wavelet-CLIP}, a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion. Wavelet-CLIP utilizes Wavelet Transforms to deeply analyze both spatial and frequency features from images, thus enhancing the model's capability to detect sophisticated deepfakes. To verify the effectiveness of our approach, we conducted extensive evaluations against existing state-of-the-art methods for cross-dataset generalization and detection of unseen images generated by standard diffusion models. Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes, outperforming all compared methods. The code can be reproduced from the repo: url{https://github.com/lalithbharadwajbaru/Wavelet-CLIP}
Problem

Research questions and friction points this paper is trying to address.

Hyper-realistic Face Forgery
Image Authentication
Digital Forensics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet-CLIP
Forgery-Detection
Cross-Dataset Performance
🔎 Similar Papers
No similar papers found.