SKDU at De-Factify 4.0: Vision Transformer with Data Augmentation for AI-Generated Image Detection

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the problem of detecting AI-generated images. We propose a robust detection method based on fine-tuning Vision Transformers (ViT), specifically designed to generalize across diverse generative models. Our approach introduces, for the first time on the De-Factify-4.0 multi-source synthetic image dataset, a systematic integration of heterogeneous perturbations—including geometric transformations, Gaussian noise, and JPEG compression—as data augmentation strategies. Leveraging transfer learning and end-to-end fine-tuning, the model achieves state-of-the-art performance on both validation and test sets, outperforming existing methods in accuracy and F1-score. Crucially, our results empirically validate that synergistic optimization of the ViT architecture with composite data augmentation significantly enhances cross-generator generalization and robustness—particularly against outputs from leading diffusion models such as Stable Diffusion, DALL·E 3, and MidJourney.

Technology Category

Application Category

📝 Abstract

The aim of this work is to explore the potential of pre-trained vision-language models, e.g. Vision Transformers (ViT), enhanced with advanced data augmentation strategies for the detection of AI-generated images. Our approach leverages a fine-tuned ViT model trained on the Defactify-4.0 dataset, which includes images generated by state-of-the-art models such as Stable Diffusion 2.1, Stable Diffusion XL, Stable Diffusion 3, DALL-E 3, and MidJourney. We employ perturbation techniques like flipping, rotation, Gaussian noise injection, and JPEG compression during training to improve model robustness and generalisation. The experimental results demonstrate that our ViT-based pipeline achieves state-of-the-art performance, significantly outperforming competing methods on both validation and test datasets.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images using Vision Transformers

Enhancing model robustness with data augmentation techniques

Outperforming existing methods on validation and test datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer for AI image detection

Advanced data augmentation techniques

Fine-tuned model on Defactify-4.0 dataset

🔎 Similar Papers

No similar papers found.