🤖 AI Summary
Current AI-based image detectors heavily rely on generator-specific artifacts, rendering them ineffective against cascaded degradation—induced by multi-round cross-platform sharing and post-processing—in real-world scenarios, thereby suffering from poor generalization. To address this, we propose a Real-image-centered Envelope Modeling (REM) paradigm that abandons reliance on volatile artifacts and instead learns a robust manifold boundary of authentic images in feature space. Methodologically, REM synthesizes near-authentic samples via self-reconstructive feature perturbation and jointly optimizes an envelope estimator with cross-domain consistency regularization to learn a compact, resilient envelope of the real-image distribution. Evaluated on eight standard benchmarks, REM achieves an average improvement of 7.5% in detection accuracy. Moreover, on our newly constructed RealChain benchmark—designed to simulate realistic cascaded degradation—it significantly outperforms existing methods, establishing a foundation for highly robust, real-world deployment of AI-generated image detection.
📝 Abstract
The rapid progress of generative models has intensified the need for reliable and robust detection under real-world conditions. However, existing detectors often overfit to generator-specific artifacts and remain highly sensitive to real-world degradations. As generative architectures evolve and images undergo multi-round cross-platform sharing and post-processing (chain degradations), these artifact cues become obsolete and harder to detect. To address this, we propose Real-centric Envelope Modeling (REM), a new paradigm that shifts detection from learning generator artifacts to modeling the robust distribution of real images. REM introduces feature-level perturbations in self-reconstruction to generate near-real samples, and employs an envelope estimator with cross-domain consistency to learn a boundary enclosing the real image manifold. We further build RealChain, a comprehensive benchmark covering both open-source and commercial generators with simulated real-world degradation. Across eight benchmark evaluations, REM achieves an average improvement of 7.5% over state-of-the-art methods, and notably maintains exceptional generalization on the severely degraded RealChain benchmark, establishing a solid foundation for synthetic image detection under real-world conditions. The code and the RealChain benchmark will be made publicly available upon acceptance of the paper.