Optimized Couplings for Watermarking Large Language Models

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the copyright attribution of large language model (LLM) outputs by investigating the fundamental limits of embedding low-perturbation watermarks in single-generation text. It formalizes watermark detection as a hypothesis testing problem with side information, characterizing the intrinsic trade-off between detection power and textual distortion. Method: We propose an optimal coupling and randomization strategy grounded in worst-case token distributions under minimum-entropy constraints, integrating hypothesis testing, optimal transport, information theory, and randomized vocabulary partitioning. Contribution/Results: Our framework yields the first closed-form expression for watermark detection rate and quantifies the distortion cost in a max-min sense. Evaluated on synthetic data and multiple real-world LLMs, it achieves detection rates approaching the theoretical upper bound—significantly outperforming state-of-the-art methods—while preserving text fluency and semantic coherence.

Technology Category

Application Category

📝 Abstract

Large-language models (LLMs) are now able to produce text that is, in many cases, seemingly indistinguishable from human-generated content. This has fueled the development of watermarks that imprint a ``signal'' in LLM-generated text with minimal perturbation of an LLM's output. This paper provides an analysis of text watermarking in a one-shot setting. Through the lens of hypothesis testing with side information, we formulate and analyze the fundamental trade-off between watermark detection power and distortion in generated textual quality. We argue that a key component in watermark design is generating a coupling between the side information shared with the watermark detector and a random partition of the LLM vocabulary. Our analysis identifies the optimal coupling and randomization strategy under the worst-case LLM next-token distribution that satisfies a min-entropy constraint. We provide a closed-form expression of the resulting detection rate under the proposed scheme and quantify the cost in a max-min sense. Finally, we provide an array of numerical results, comparing the proposed scheme with the theoretical optimum and existing schemes, in both synthetic data and LLM watermarking. Our code is available at https://github.com/Carol-Long/CC_Watermark

Problem

Research questions and friction points this paper is trying to address.

Optimizing watermark detection power and text quality trade-off

Designing coupling between detector info and vocabulary partition

Identifying optimal coupling under worst-case LLM distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal coupling between side information and vocabulary partition

Closed-form expression for watermark detection rate

Max-min analysis of watermarking cost

🔎 Similar Papers

A Watermark for Black-Box Language Models