SSAFE: Simple and Strong AI-Generated Image Detection via Frozen Vision Encoders

📅 2026-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges posed by the proliferation of AI-generated images and the limitations of existing detection methods, which often rely on large-scale training data and exhibit poor generalization. To overcome these issues, the authors propose a lightweight detection framework based on a frozen multimodal vision encoder. Leveraging the inherent separability between real and synthetic images in the encoder’s embedding space, the method employs a representation-aware data selection strategy to construct a highly efficient training set using only 10K samples. A simple linear classifier trained on this curated set achieves robust detection performance. Evaluated on benchmarks such as RealWorldBench, the approach significantly outperforms prior methods like AIGIBench and OpenFake in generalizing to unseen generators and distribution shifts, despite using substantially less training data.
📝 Abstract
The rapid advancement of generative models has blurred the boundary between synthetic and real imagery, creating an urgent need for reliable deepfake detection. Yet most existing approaches rely on massive real--fake datasets, which are increasingly difficult to maintain as new generators continue to emerge. In this work, we investigate how much information about image authenticity is already encoded in modern multimodal vision representations. We find that frozen multimodal encoders naturally separate real and synthetic images in their embedding space, enabling a simple linear classifier to achieve strong performance without task-specific fine-tuning. Motivated by this observation, we develop a representation-aware data curation strategy that selects a compact set of representative generators for training. The resulting training set contains only 10K images, compared to 288K in AIGIBench and 4M in OpenFake, while improving robustness to unseen generators and distribution shifts. We additionally introduce RealWorldBench, a benchmark consisting of modern camera photographs, contemporary stock images, and outputs from recent commercial generators. Experiments across multiple benchmarks show that combining frozen multimodal representations with carefully curated training data provides a simple and effective approach to AI-generated image detection.
Problem

Research questions and friction points this paper is trying to address.

AI-generated image detection
deepfake detection
synthetic imagery
image authenticity
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

frozen vision encoders
AI-generated image detection
representation-aware curation
linear classifier
RealWorldBench
🔎 Similar Papers
No similar papers found.