Vanishing Watermarks: Diffusion-Based Image Editing Undermines Robust Invisible Watermarking

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing robust invisible watermarking schemes are highly vulnerable to diffusion model–based image editing. This work provides the first information-theoretic proof that the diffusion process, through image regeneration, can drive the mutual information between the watermark and its carrier arbitrarily close to zero, thereby effectively erasing the watermark while preserving visual fidelity. Building on this insight, we propose a guided diffusion attack strategy that explicitly attenuates the watermark signal. Experiments demonstrate that state-of-the-art watermarking methods—including StegaStamp, TrustMark, and VINE—exhibit near-zero recovery rates under this attack, exposing a fundamental security vulnerability in current watermarking technologies in the era of generative AI.

Technology Category

Application Category

📝 Abstract

Robust invisible watermarking schemes aim to embed hidden information into images such that the watermark survives common manipulations. However, powerful diffusion-based image generation and editing techniques now pose a new threat to these watermarks. In this paper, we present a comprehensive theoretical and empirical analysis demonstrating that diffusion models can effectively erase robust watermarks even when those watermarks were designed to withstand conventional distortions. We show that a diffusion-driven image regeneration process, which leverages generative models to recreate an image, can remove embedded watermarks while preserving the image's perceptual content. Furthermore, we introduce a guided diffusion-based attack that explicitly targets the embedded watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion transformations, the mutual information between the watermarked image and the hidden payload approaches zero, leading to inevitable decoding failure. Experimentally, we evaluate multiple state-of-the-art watermarking methods (including deep learning-based schemes like StegaStamp, TrustMark, and VINE) and demonstrate that diffusion edits yield near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new strategies to ensure watermark resilience in the era of powerful diffusion models.

Problem

Research questions and friction points this paper is trying to address.

invisible watermarking

diffusion models

image editing

watermark robustness

generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

invisible watermarking

watermark removal

guided diffusion attack

mutual information

🔎 Similar Papers

No similar papers found.

Authors to Follow