Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the risk that maliciously tampered LLM outputs may still pass watermark verification—leading to erroneous attribution—this work proposes a dual-check watermarking mechanism that jointly enables generated-text detection and tampering detection. Methodologically, it employs bias-free watermark embedding; introduces the novel “dropped-token” metric for tampering-sensitive, unbiased detection; constructs a tampering-aware module based on token-distribution shift; and enhances watermark extraction and statistical significance testing. Evaluated across multiple LLM output datasets, the method achieves >99% accuracy in generation detection and >95% tampering detection rate, significantly outperforming existing single-objective watermarking schemes. To our knowledge, this is the first watermarking framework that, while preserving watermark non-removability, simultaneously ensures reliable provenance tracing and explicit tampering identifiability.

Technology Category

Application Category

📝 Abstract
The development of large language models (LLMs) has raised concerns about potential misuse. One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction. Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks. For example, attackers can alter the watermarked text to produce harmful content without compromising the presence of the watermark, which could lead to false attribution of this malicious content to the LLM. This situation poses a serious threat to the LLMs service providers and highlights the significance of achieving modification detection and generated-text detection simultaneously. Therefore, we propose a technique to detect modifications in text for unbiased watermark which is sensitive to modification. We introduce a new metric called ``discarded tokens", which measures the number of tokens not included in watermark detection. When a modification occurs, this metric changes and can serve as evidence of the modification. Additionally, we improve the watermark detection process and introduce a novel method for unbiased watermark. Our experiments demonstrate that we can achieve effective dual detection capabilities: modification detection and generated-text detection by watermark.
Problem

Research questions and friction points this paper is trying to address.

Detect text modifications in LLM outputs
Identify generated text using watermarking
Prevent false attribution of malicious content
Innovation

Methods, ideas, or system contributions that make the work stand out.

watermark embedding
modification detection
unbiased watermark method
🔎 Similar Papers
No similar papers found.
Y
Yuhang Cai
School of Computer Science, Hefei University of Technology, Hefei
Yaofei Wang
Yaofei Wang
Hefei University of Technology
information hidingsteganographysteganalysis
D
Donghui Hu
School of Computer Science, Hefei University of Technology, Hefei
G
Gu Chen
School of Computer Science, Hefei University of Technology, Hefei