Provable Watermarking for Data Poisoning Attacks

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In data poisoning attacks, the boundary between malicious and benign behavior is often ambiguous, leading to potential misuse of defensive mechanisms. Method: This paper proposes the first provably secure post-poisoning and concurrent poisoning watermarking scheme, which robustly embeds ownership identifiers into models during training via statistical hypothesis testing. Contribution/Results: The scheme establishes, for the first time, a theoretical relationship among watermark length, model dimensionality, and perturbation magnitude—ensuring both detectability and robustness against removal while preserving poisoning efficacy. Theoretical analysis and extensive experiments across diverse models and datasets demonstrate high-precision identification of poisoned data, minimal degradation in model performance, and full retention of attack functionality. Our approach thus provides a novel paradigm for data ownership assertion and secure governance that simultaneously satisfies strong security guarantees and practical deployability.

Technology Category

Application Category

📝 Abstract
In recent years, data poisoning attacks have been increasingly designed to appear harmless and even beneficial, often with the intention of verifying dataset ownership or safeguarding private data from unauthorized use. However, these developments have the potential to cause misunderstandings and conflicts, as data poisoning has traditionally been regarded as a security threat to machine learning systems. To address this issue, it is imperative for harmless poisoning generators to claim ownership of their generated datasets, enabling users to identify potential poisoning to prevent misuse. In this paper, we propose the deployment of watermarking schemes as a solution to this challenge. We introduce two provable and practical watermarking approaches for data poisoning: {em post-poisoning watermarking} and {em poisoning-concurrent watermarking}. Our analyses demonstrate that when the watermarking length is $Θ(sqrt{d}/ε_w)$ for post-poisoning watermarking, and falls within the range of $Θ(1/ε_w^2)$ to $O(sqrt{d}/ε_p)$ for poisoning-concurrent watermarking, the watermarked poisoning dataset provably ensures both watermarking detectability and poisoning utility, certifying the practicality of watermarking under data poisoning attacks. We validate our theoretical findings through experiments on several attacks, models, and datasets.
Problem

Research questions and friction points this paper is trying to address.

Developing provable watermarking for data poisoning attacks
Enabling ownership claims for harmless poisoning datasets
Ensuring watermark detectability and poisoning utility simultaneously
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes two provable watermarking schemes for data poisoning
Post-poisoning watermarking requires length of Θ(√d/ε_w)
Poisoning-concurrent watermarking ensures detectability and utility
🔎 Similar Papers
No similar papers found.