Provably Robust Multi-bit Watermarking for AI-generated Text

📅 2024-01-30

📈 Citations: 1

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing LLM text watermarking techniques suffer from insufficient edit robustness and low accuracy in embedding long messages. To address this, we propose the first multi-bit watermarking framework based on pseudo-random segment allocation. Our method integrates adaptive token selection, redundant encoding, and checksum verification, and establishes a theoretical model for edit robustness—providing, for the first time, provable segment-level fault tolerance (average edit distance tolerance: 17). Experiments show that embedding a 20-bit user ID into 200-token texts achieves a matching accuracy of 97.6%, substantially outperforming the state-of-the-art (49.2%). Moreover, the watermark maintains high extraction accuracy under diverse adversarial edits—including token substitution, deletion, and insertion. This work establishes a new paradigm for AI-generated content provenance, uniquely combining rigorous theoretical guarantees with practical performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities of generating texts resembling human language. However, they can be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to address these concerns, which embeds a message (e.g., a bit string) into a text generated by an LLM. By embedding the user ID (represented as a bit string) into generated texts, we can trace generated texts to the user, known as content source tracing. The major limitation of existing watermarking techniques is that they achieve sub-optimal performance for content source tracing in real-world scenarios. The reason is that they cannot accurately or efficiently extract a long message from a generated text. We aim to address the limitations. In this work, we introduce a new watermarking method for LLM-generated text grounded in pseudo-random segment assignment. We also propose multiple techniques to further enhance the robustness of our watermarking algorithm. We conduct extensive experiments to evaluate our method. Our experimental results show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a message of length 20 into a 200-token generated text, our method achieves a match rate of $97.6%$, while the state-of-the-art work Yoo et al. only achieves $49.2%$. Additionally, we prove that our watermark can tolerate edits within an edit distance of 17 on average for each paragraph under the same setting.

Problem

Research questions and friction points this paper is trying to address.

Watermarking Techniques

Large Language Models

Text Attribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilevel Watermarking

Large Language Models

Content Security

🔎 Similar Papers

From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models