🤖 AI Summary
To address catastrophic forgetting during CLIP fine-tuning—which degrades cross-domain generalization—this paper proposes a distillation-guided gradient surgery framework. The core innovation lies in gradient space decomposition and orthogonal projection: the optimization direction is decomposed into beneficial components (preserving pretrained priors) and harmful components (inducing forgetting); the CLIP encoder is frozen, and only beneficial gradients are distilled, thereby jointly preserving prior knowledge and suppressing irrelevant features. The method integrates multimodal fine-grained control techniques, including knowledge distillation and frozen-encoder feature alignment. Experiments across 50 AI-generated image models demonstrate that our approach achieves an average accuracy gain of 6.6 percentage points over state-of-the-art methods, significantly improving detection performance and cross-model generalization capability.
📝 Abstract
The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.