Optimal Detection for Language Watermarks with Pseudorandom Collision

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing language watermarking methods rely on the ideal pseudorandomness assumption, yet large language models (LLMs) frequently generate text exhibiting repetitive patterns and structured dependencies—undermining classical statistical analysis and causing uncontrolled Type-I error rates. Method: We propose the first watermark detection framework tailored to *non-ideal pseudorandomness*. It introduces the concept of the “minimal independent unit” and constructs a hierarchical two-level partitioning structure; detection is formulated as a minimax hypothesis testing problem. Leveraging Gumbel-max and inverse-transform watermarking mechanisms, we derive a closed-form optimal detection rule that explicitly characterizes and corrects intra-unit dependencies. Contribution/Results: We provide rigorous theoretical guarantees ensuring strict Type-I error control under finite-sample (non-asymptotic) conditions. Empirical evaluation demonstrates substantial gains in detection power over state-of-the-art baselines. This work establishes the first principled, non-asymptotically optimal statistical foundation for language watermarking in imperfect pseudorandom settings.

Technology Category

Application Category

📝 Abstract

Text watermarking plays a crucial role in ensuring the traceability and accountability of large language model (LLM) outputs and mitigating misuse. While promising, most existing methods assume perfect pseudorandomness. In practice, repetition in generated text induces collisions that create structured dependence, compromising Type I error control and invalidating standard analyses. We introduce a statistical framework that captures this structure through a hierarchical two-layer partition. At its core is the concept of minimal units -- the smallest groups treatable as independent across units while permitting dependence within. Using minimal units, we define a non-asymptotic efficiency measure and cast watermark detection as a minimax hypothesis testing problem. Applied to Gumbel-max and inverse-transform watermarks, our framework produces closed-form optimal rules. It explains why discarding repeated statistics often improves performance and shows that within-unit dependence must be addressed unless degenerate. Both theory and experiments confirm improved detection power with rigorous Type I error control. These results provide the first principled foundation for watermark detection under imperfect pseudorandomness, offering both theoretical insight and practical guidance for reliable tracing of model outputs.

Problem

Research questions and friction points this paper is trying to address.

Addressing pseudorandom collision effects in text watermarking

Developing optimal detection rules for Gumbel-max and inverse-transform watermarks

Ensuring rigorous Type I error control under structured dependence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical two-layer partition captures structured dependence

Minimal units enable independent treatment across groups

Closed-form optimal rules improve detection with error control

🔎 Similar Papers

Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs