🤖 AI Summary
Current AI-generated image detectors suffer significant performance degradation in real-world scenarios, primarily due to transformation biases in training data that lead models to rely on spurious correlations rather than genuine forensic features. To address this issue, this work proposes BIAS-ID, a novel framework that establishes the first interpretable and transparent evaluation protocol. By integrating cross-dataset transformation testing, bias sensitivity analysis, and multi-detector comparison, BIAS-ID systematically quantifies and disentangles the effects of transformation bias from inherent robustness limitations. Evaluations of six state-of-the-art detectors on two benchmark datasets reveal that most advanced methods are substantially compromised by transformation bias, thereby validating the effectiveness of BIAS-ID and underscoring its critical role in advancing bias-aware evaluation paradigms for image forensics.
📝 Abstract
Given the surge of harmful AI-generated imagery online, reliably distinguishing authentic images from generated ones has become an urgent research topic. While many proposed detection methods perform well under controlled settings, they often collapse when tested on real-world data. A potential root cause are subtle biases in the detectors' training data. As a result, detectors may rely on spurious correlations instead of learning true forensic artifacts. While a recent line of work has identified the problem, there is not yet an established protocol to evaluate how biased a detector actually is. In this work, we therefore take a step back: First, we discuss what it means for a detector to be biased, and how this differs from a lack of robustness. Second, we propose BIAS-ID, a transparent framework for analyzing and quantifying the presence of transformation biases in AI-generated image detectors. We validate our framework by performing an evaluation of six detectors across two datasets, revealing that several state-of-the-art detection methods are strongly affected by biases. Our results highlight the importance of bias-aware evaluation for developing reliable AI-generated image detectors.