WISER: Segmenting watermarked region - an epidemic change-point perspective

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing watermark detection methods struggle to achieve fine-grained, robust, and scalable localization of watermarked regions in mixed-source text, particularly lacking theoretical guarantees against paraphrasing and post-hoc editing. This paper proposes WISER, the first method to formulate watermark segmentation as a *popularity-based change-point detection* problem, establishing a novel framework with rigorous statistical guarantees: it supports joint detection of multiple watermark segments, provides finite-sample error bounds and consistency proofs, and ensures robustness to semantic rewriting. WISER integrates statistical hypothesis testing with sequential change-point analysis and introduces an efficient online solver compatible with diverse watermark embedding mechanisms. Evaluated on multiple benchmark datasets, WISER significantly outperforms state-of-the-art methods in both detection accuracy and computational efficiency, empirically validating its theoretical soundness and practical utility.

Technology Category

Application Category

📝 Abstract
With the increasing popularity of large language models, concerns over content authenticity have led to the development of myriad watermarking schemes. These schemes can be used to detect a machine-generated text via an appropriate key, while being imperceptible to readers with no such keys. The corresponding detection mechanisms usually take the form of statistical hypothesis testing for the existence of watermarks, spurring extensive research in this direction. However, the finer-grained problem of identifying which segments of a mixed-source text are actually watermarked, is much less explored; the existing approaches either lack scalability or theoretical guarantees robust to paraphrase and post-editing. In this work, we introduce a unique perspective to such watermark segmentation problems through the lens of epidemic change-points. By highlighting the similarities as well as differences of these two problems, we motivate and propose WISER: a novel, computationally efficient, watermark segmentation algorithm. We theoretically validate our algorithm by deriving finite sample error-bounds, and establishing its consistency in detecting multiple watermarked segments in a single text. Complementing these theoretical results, our extensive numerical experiments show that WISER outperforms state-of-the-art baseline methods, both in terms of computational speed as well as accuracy, on various benchmark datasets embedded with diverse watermarking schemes. Our theoretical and empirical findings establish WISER as an effective tool for watermark localization in most settings. It also shows how insights from a classical statistical problem can lead to a theoretically valid and computationally efficient solution of a modern and pertinent problem.
Problem

Research questions and friction points this paper is trying to address.

Identifying watermarked segments in mixed-source texts
Developing scalable algorithm robust to paraphrasing and editing
Providing theoretical guarantees for watermark localization accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Epidemic change-point perspective for watermark segmentation
Computationally efficient algorithm with theoretical guarantees
Detects multiple watermarked segments in mixed-source texts
🔎 Similar Papers
No similar papers found.