TextFake: Benchmarking AI-Generated Image Detection on Text-Rich Images

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the unexamined performance of existing AI-generated image detectors on text-rich synthetic images—such as fake screenshots and forged documents—where their efficacy remains unknown despite success on natural images. The authors introduce TextFake, a benchmark comprising 20,000 images spanning 28 languages, four content categories, and two scenarios, constructed via a four-stage controllable synthesis pipeline. They conduct the first systematic evaluation of 14 specialized detectors and three vision-language model APIs under zero-shot settings. The study uncovers three critical failure modes: the curse of text density, rendering fidelity camouflage, and threshold collapse. To mitigate covariate shortcuts, they propose distribution-aligned structured prompting. Results reveal that all methods achieve below 80% accuracy on TextFake, with some suffering over 60% performance degradation compared to natural images, highlighting severe limitations of current detection approaches in text-dense contexts.

📝 Abstract

Recent AI-generated image (AIGI) detectors perform well on natural-image benchmarks, but their behavior on text-rich forgeries, such as fabricated screenshots, documents, and news pages prevalent in misinformation, remains untested. We introduce TextFake, a 20,000-image benchmark for text-rich AIGI detection spanning 28 languages, 4 topic categories, and 2 scene modalities. Fake images are synthesized via a four-stage pipeline that annotates real images along three controlled dimensions and generates counterparts through distribution-aligned structured prompting, ruling out covariate shortcuts. Zero-shot evaluation of 14 specialized detectors and 3 frontier VLM APIs reveals a large systematic gap: no method exceeds 80% accuracy, with some dropping over 60% from natural-image benchmarks. Diagnostic evaluations identify three failure modes: the Text Density Curse, where dense glyphs overwhelm low-level detectors; Cloaking via Rendering Fidelity, where stronger text rendering suppresses enerative artifacts; and Threshold Collapse, where routine perturbations drive detectors toward chance-level performance.

Problem

Research questions and friction points this paper is trying to address.

AI-generated image detection

text-rich images

misinformation

benchmarking

forgery detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

TextFake

AI-generated image detection

text-rich forgery