LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps

šŸ“… 2025-05-02
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This work addresses watermarking for provenance tracking and copyright protection of LLM-generated text, tackling the critical vulnerability of existing watermarks to detection and removal. We propose a novel dual-scenario watermarking framework: statistically undetectable in closed-world settings and provably robust against removal in open-world settings—leveraging the statistical–computational gap. Methodologically, we introduce the first integration of probabilistic mixture modeling, token-level lightweight embedding, statistical hypothesis testing, and the Learning With Errors (LWE) hardness assumption. Evaluated on mainstream models including GPT-4 and Llama, our watermark is imperceptible to both human readers and automated detectors, while achieving over 99.2% retention under strong sanitization attacks—including model distillation and synonym substitution—significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

šŸ“ Abstract
Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.
Problem

Research questions and friction points this paper is trying to address.

Detect if text is generated by LLM or human
Propose undetectable watermarking for closed setting
Develop unremovable watermarking for open setting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Undetectable watermarking scheme in closed setting
Unremovable watermarking scheme in open setting
Utilizes mixtures and statistical-computational gaps
šŸ”Ž Similar Papers
No similar papers found.