🤖 AI Summary
Existing robust invisible watermarking schemes are highly vulnerable to diffusion model–based image editing. This work provides the first information-theoretic proof that the diffusion process, through image regeneration, can drive the mutual information between the watermark and its carrier arbitrarily close to zero, thereby effectively erasing the watermark while preserving visual fidelity. Building on this insight, we propose a guided diffusion attack strategy that explicitly attenuates the watermark signal. Experiments demonstrate that state-of-the-art watermarking methods—including StegaStamp, TrustMark, and VINE—exhibit near-zero recovery rates under this attack, exposing a fundamental security vulnerability in current watermarking technologies in the era of generative AI.
📝 Abstract
Robust invisible watermarking schemes aim to embed hidden information into images such that the watermark survives common manipulations. However, powerful diffusion-based image generation and editing techniques now pose a new threat to these watermarks. In this paper, we present a comprehensive theoretical and empirical analysis demonstrating that diffusion models can effectively erase robust watermarks even when those watermarks were designed to withstand conventional distortions. We show that a diffusion-driven image regeneration process, which leverages generative models to recreate an image, can remove embedded watermarks while preserving the image's perceptual content. Furthermore, we introduce a guided diffusion-based attack that explicitly targets the embedded watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion transformations, the mutual information between the watermarked image and the hidden payload approaches zero, leading to inevitable decoding failure. Experimentally, we evaluate multiple state-of-the-art watermarking methods (including deep learning-based schemes like StegaStamp, TrustMark, and VINE) and demonstrate that diffusion edits yield near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new strategies to ensure watermark resilience in the era of powerful diffusion models.