Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

In the context of rapidly increasing AI-generated content, posterior image watermarking schemes face severe security threats from black-box watermark forgery attacks, particularly when attackers lack access to the original watermarking model. Method: This paper proposes the first transferable, one-shot watermark forgery method that requires only a single watermarked image and no access to the target watermark model. Our approach comprises two key components: (1) an unsupervised watermark detection framework based on a ranking-loss-driven image preference model; and (2) a hybrid optimization strategy integrating procedural image generation with gradient-based backpropagation to simultaneously remove and re-inject watermarks under black-box constraints. Results: Extensive experiments across multiple state-of-the-art posterior watermarking systems demonstrate both high attack success rates and strong cross-model transferability, exposing fundamental robustness vulnerabilities of existing schemes in unknown-model scenarios and establishing a new benchmark for watermark robustness evaluation and defense design.

Technology Category

Application Category

📝 Abstract

Recent years have seen a surge in interest in digital content watermarking techniques, driven by the proliferation of generative models and increased legal pressure. With an ever-growing percentage of AI-generated content available online, watermarking plays an increasingly important role in ensuring content authenticity and attribution at scale. There have been many works assessing the robustness of watermarking to removal attacks, yet, watermark forging, the scenario when a watermark is stolen from genuine content and applied to malicious content, remains underexplored. In this work, we investigate watermark forging in the context of widely used post-hoc image watermarking. Our contributions are as follows. First, we introduce a preference model to assess whether an image is watermarked. The model is trained using a ranking loss on purely procedurally generated images without any need for real watermarks. Second, we demonstrate the model's capability to remove and forge watermarks by optimizing the input image through backpropagation. This technique requires only a single watermarked image and works without knowledge of the watermarking model, making our attack much simpler and more practical than attacks introduced in related work. Third, we evaluate our proposed method on a variety of post-hoc image watermarking models, demonstrating that our approach can effectively forge watermarks, questioning the security of current watermarking approaches. Our code and further resources are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Investigating watermark forging attacks on post-hoc image watermarking techniques

Developing preference models to detect watermarks without real watermark data

Demonstrating black-box watermark removal and transfer using single images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference model assesses watermark presence procedurally

Backpropagation optimizes images to forge watermarks

Black-box attack requires only single watermarked image

🔎 Similar Papers

A Transfer Attack to Image Watermarks