NS-Net: Decoupling CLIP Semantic Information through NULL-Space for Generalizable AI-Generated Image Detection

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI-generated image detectors exhibit poor generalization to unseen generative models, especially when forged and authentic images share highly similar semantic content. Method: We identify that CLIP’s high-level semantic features impede discrimination of generation artifacts; thus, we propose the NULL-space projection framework to explicitly decouple semantic information from generative traces. We further design a local-aware patch selection mechanism to mitigate semantic bias introduced by global structural cues and integrate contrastive learning to sharpen decision boundaries. Contribution/Results: Our end-to-end detector achieves a 7.4% absolute accuracy gain over state-of-the-art methods on an open-world benchmark covering 40 diverse generative models, demonstrating significantly improved generalization robustness to previously unseen generators.

Technology Category

Application Category

📝 Abstract
The rapid progress of generative models, such as GANs and diffusion models, has facilitated the creation of highly realistic images, raising growing concerns over their misuse in security-sensitive domains. While existing detectors perform well under known generative settings, they often fail to generalize to unknown generative models, especially when semantic content between real and fake images is closely aligned. In this paper, we revisit the use of CLIP features for AI-generated image detection and uncover a critical limitation: the high-level semantic information embedded in CLIP's visual features hinders effective discrimination. To address this, we propose NS-Net, a novel detection framework that leverages NULL-Space projection to decouple semantic information from CLIP's visual features, followed by contrastive learning to capture intrinsic distributional differences between real and generated images. Furthermore, we design a Patch Selection strategy to preserve fine-grained artifacts by mitigating semantic bias caused by global image structures. Extensive experiments on an open-world benchmark comprising images generated by 40 diverse generative models show that NS-Net outperforms existing state-of-the-art methods, achieving a 7.4% improvement in detection accuracy, thereby demonstrating strong generalization across both GAN- and diffusion-based image generation techniques.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images across unknown models
Decoupling semantic info from CLIP for better detection
Improving generalization in GAN and diffusion image detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

NULL-Space projection decouples CLIP semantic information
Contrastive learning captures real vs generated differences
Patch Selection mitigates semantic bias in global structures
🔎 Similar Papers
No similar papers found.
Jiazhen Yan
Jiazhen Yan
Nanjing University of Information Science and Technology
AIGC Detection、AI Security
F
Fan Wang
School of Computer Science, Nanjing University of Information Science and Technology
W
Weiwei Jiang
School of Computer Science, Nanjing University of Information Science and Technology
Ziqiang Li
Ziqiang Li
Associate Professor, Nanjing University of Information Sciences and Technology
AIGCBackdoor LearningAI Security
Z
Zhangjie Fu
School of Computer Science, Nanjing University of Information Science and Technology