Improved Detection of Latent Diffusion-Generated Images through Aligned Datasets

📅 2024-10-15

📈 Citations: 2

✨ Influential: 1

career value

191K/year

🤖 AI Summary

Existing detection methods for latent diffusion model (LDM)-generated images suffer from spurious correlations due to semantic and geometric mismatches between real and synthetic images, undermining generalization and robustness. Method: We propose a semantic-geometric alignment paradigm for data construction: leveraging the LDM encoder to directly reconstruct real images—bypassing costly denoising—thereby compelling the detector to focus on intrinsic decoder artifacts. Contribution/Results: We are the first to systematically demonstrate the critical impact of training data alignment on detection robustness. Our zero-cost, denoising-free alignment strategy enables efficient paired sample generation. Extensive evaluation on non-natural images confirms strong generalization. The resulting detector exhibits markedly reduced dependence on confounding factors (e.g., resolution, format), achieves superior cross-model generalization and adversarial robustness compared to state-of-the-art methods, and incurs significantly lower training overhead.

Technology Category

Application Category

📝 Abstract

As latent diffusion models (LDMs) democratize image generation capabilities, there is a growing need to detect fake images. A good detector should focus on the generative models fingerprints while ignoring image properties such as semantic content, resolution, file format, etc. Fake image detectors are usually built in a data driven way, where a model is trained to separate real from fake images. Existing works primarily investigate network architecture choices and training recipes. In this work, we argue that in addition to these algorithmic choices, we also require a well aligned dataset of real/fake images to train a robust detector. For the family of LDMs, we propose a very simple way to achieve this: we reconstruct all the real images using the LDMs autoencoder, without any denoising operation. We then train a model to separate these real images from their reconstructions. The fakes created this way are extremely similar to the real ones in almost every aspect (e.g., size, aspect ratio, semantic content), which forces the model to look for the LDM decoders artifacts. We empirically show that this way of creating aligned real/fake datasets, which also sidesteps the computationally expensive denoising process, helps in building a detector that focuses less on spurious correlations, something that a very popular existing method is susceptible to. Finally, to demonstrate just how effective the alignment in a dataset can be, we build a detector using images that are not natural objects, and present promising results. Overall, our work identifies the subtle but significant issues that arise when training a fake image detector and proposes a simple and inexpensive solution to address these problems.

Problem

Research questions and friction points this paper is trying to address.

Detecting fake images from latent diffusion models

Aligning datasets to train robust fake image detectors

Reducing spurious correlations in fake image detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstruct real images using LDMs autoencoder.

Train model to separate real from reconstructions.

Focus on LDM decoder artifacts for detection.

🔎 Similar Papers

LaRE2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection