A Novel Unified Approach to Deepfake Detection

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a unified deepfake detection framework that addresses the growing threat of deepfakes to digital trust by integrating spatial-frequency cross-attention mechanisms with physiological signals—such as blood pulsation—for high-precision identification of manipulated content in both images and videos. The approach leverages either Swin Transformer or EfficientNet-B4 for visual feature extraction and employs BERT for multimodal fusion. Evaluated on the FaceForensics++ (FF++) and Celeb-DF benchmarks, the Swin+BERT variant achieves state-of-the-art performance with AUC scores of 99.80% and 99.88%, respectively, significantly outperforming existing methods while demonstrating exceptional cross-dataset generalization capability.

Technology Category

Application Category

📝 Abstract
The advancements in the field of AI is increasingly giving rise to various threats. One of the most prominent of them is the synthesis and misuse of Deepfakes. To sustain trust in this digital age, detection and tagging of deepfakes is very necessary. In this paper, a novel architecture for Deepfake detection in images and videos is presented. The architecture uses cross attention between spatial and frequency domain features along with a blood detection module to classify an image as real or fake. This paper aims to develop a unified architecture and provide insights into each step. Though this approach we achieve results better than SOTA, specifically 99.80%, 99.88% AUC on FF++ and Celeb-DF upon using Swin Transformer and BERT and 99.55, 99.38 while using EfficientNet-B4 and BERT. The approach also generalizes very well achieving great cross dataset results as well.
Problem

Research questions and friction points this paper is trying to address.

Deepfake detection
AI-generated content
digital trust
image and video forgery
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross attention
frequency domain features
blood detection module
unified deepfake detection
Swin Transformer
🔎 Similar Papers
No similar papers found.