FiSeR: Fine-Grained Source Representations for Cross-Domain AI Image Detection

📅 2026-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses the limited generalization of existing AI-generated image detectors in cross-domain scenarios, which primarily stems from the classifier head overfitting to artifacts specific to the training domain. To mitigate this, the authors propose a hierarchical contrastive learning framework that jointly optimizes coarse-grained contrast between natural and synthetic images and fine-grained contrast based on generator identity—a novel supervisory signal introduced for the first time to encourage more transferable representations. The approach combines a frozen backbone with a few-shot SVM adaptation strategy and validates feature separability through unsupervised UMAP visualization. Evaluated on WildFake, the method achieves an average cross-domain AUROC improvement of 10.22; under few-shot settings, it further yields gains of 10.64 and 17.41 on AIGIBench and Chameleon, respectively.
📝 Abstract
Real-world synthetic image detectors often generalize poorly under domain shift despite strong in-domain performance. Using unsupervised UMAP projections, we find that natural and synthetic features remain partially separable on unseen datasets, yet performance still drops, suggesting that the classification head overfits to training-domain artifacts. Therefore, the key is to learn more transferable representations so that the decision criterion is more stable and robust to domain shifts. Based on the structural fact that synthetic images are produced by diverse generators, we propose a hierarchical contrastive learning framework that improves the separability between natural and synthetic images while preserving generator identity information. It jointly optimizes (i) a coarse contrastive objective between natural and synthetic images and (ii) a fine contrastive objective among synthetic images using generator identities. Trained on WildFake, our method achieves an average AUROC gain of +10.22 on cross-domain evaluation over Chameleon, AIGIBench, Community Forensics, and GenImage under the same settings as the strong baseline DIRE. For few-shot adaptation, we freeze the backbone and fit an SVM head on 10 labeled samples per class, improving AUROC by +10.64 on AIGIBench and +17.41 on Chameleon, averaged over 12 widely used detectors. Our code is publicly available at: https://github.com/heyongxin233/FiSeR.
Problem

Research questions and friction points this paper is trying to address.

domain shift
synthetic image detection
generalization
transferable representations
cross-domain evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical contrastive learning
fine-grained source representation
cross-domain generalization
synthetic image detection
generator identity preservation
🔎 Similar Papers
No similar papers found.
Shan Zhang
Shan Zhang
Beijing University of Aeronautics and Astronautics (BUAA)
Resource AllocationGreen CommunicationsEnergy Harvesting5G
Y
Yongxin He
Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, CAS, Beijing, China
Mingming Zhang
Mingming Zhang
Beihang University
big data
H
Huiwen Tian
Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, CAS, Beijing, China
L
Lei Ma
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China