The Visual Counter Turing Test (VCT2): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (VAI)

📅 2024-11-24

📈 Citations: 2

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing AI-generated image detectors exhibit poor generalization and insufficient robustness against unseen generative models. Method: We introduce VCT2, a large-scale cross-model benchmark comprising 166K images spanning six major text-to-image models and real-world photographs, enabling zero-shot detection evaluation. We further propose the Visual AI Index (VAI)—the first prompt-agnostic, interpretable metric for quantifying image authenticity—based on twelve low-level visual features, including noise distribution, spectral characteristics, and chromatic consistency. Contribution/Results: We identify a moderate negative correlation (r ≈ −0.52) between VAI scores and detector accuracy, revealing an intrinsic trade-off between authenticity and detectability. On the COCOAI and TwitterAI domains, 17 state-of-the-art detectors achieve only 58.0% and 58.34% average accuracy, respectively. All data, code, and the VAI implementation are publicly released.

Technology Category

Application Category

📝 Abstract

The rapid progress and widespread availability of text-to-image (T2I) generative models have heightened concerns about the misuse of AI-generated visuals, particularly in the context of misinformation campaigns. Existing AI-generated image detection (AGID) methods often overfit to known generators and falter on outputs from newer or unseen models. We introduce the Visual Counter Turing Test (VCT2), a comprehensive benchmark of 166,000 images, comprising both real and synthetic prompt-image pairs produced by six state-of-the-art T2I systems: Stable Diffusion 2.1, SDXL, SD3 Medium, SD3.5 Large, DALL.E 3, and Midjourney 6. We curate two distinct subsets: COCOAI, featuring structured captions from MS COCO, and TwitterAI, containing narrative-style tweets from The New York Times. Under a unified zero-shot evaluation, we benchmark 17 leading AGID models and observe alarmingly low detection accuracy, 58% on COCOAI and 58.34% on TwitterAI. To transcend binary classification, we propose the Visual AI Index (VAI), an interpretable, prompt-agnostic realism metric based on twelve low-level visual features, enabling us to quantify and rank the perceptual quality of generated outputs with greater nuance. Correlation analysis reveals a moderate inverse relationship between VAI and detection accuracy: Pearson of -0.532 on COCOAI and -0.503 on TwitterAI, suggesting that more visually realistic images tend to be harder to detect, a trend observed consistently across generators. We release COCOAI, TwitterAI, and all codes to catalyze future advances in generalized AGID and perceptual realism assessment.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI-generated image detection methods' poor performance on diverse models

Addressing overfitting of detection methods to known generators and unseen models

Quantifying perceptual realism of AI-generated images beyond binary classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

VCT2 benchmark with 166,000 real and synthetic images

Visual AI Index using twelve low-level visual features

Unified zero-shot evaluation of 17 detection models

🔎 Similar Papers

DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks