Distilled transformers with locally enhanced global representations for face forgery detection

📅 2024-12-01
🏛️ Pattern Recognition
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deepfake detection models struggle to simultaneously capture local details (via CNNs) and model global semantics (via Transformers), suffer from attention instability in hybrid architectures, and exhibit poor generalization under limited training data. To address these issues, this paper proposes a lightweight distilled Transformer architecture. Its core innovation is a local-enhanced global representation distillation paradigm: leveraging hierarchical attention mechanisms and local window-based feature enhancement, the framework jointly compresses model capacity while preserving discriminative forgery cues within a knowledge distillation framework. Evaluated on FaceForensics++ and Celeb-DF, the method achieves state-of-the-art performance, with a maximum AUC of 99.2%. It reduces parameter count by 38% and accelerates inference by 2.1× compared to baseline models, demonstrating superior efficiency and generalization—especially in low-data regimes.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Facial Forgery Detection
CNN-Transformer Integration
Limited Labeled Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

DTN (Distillation Transformer Network)
Multi-Attention Scaling Module
Deep Fake Self-Distillation
🔎 Similar Papers
No similar papers found.
Yaning Zhang
Yaning Zhang
Qilu University of Technology (Shandong Academy of Sciences)
Q
Qiufu Li
Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, Guangdong, China; National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, 518060, Guangdong, China
Zitong Yu
Zitong Yu
U.S. Food and Drug Administration
Medical imagingDeep learningMachine learningImage reconstruction
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis