Temperature Matters: Enhancing Watermark Robustness Against Paraphrasing Attacks

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language model (LLM) text watermarking techniques exhibit insufficient robustness against paraphrase attacks. To address this, we propose a temperature-aware statistical watermarking method that exploits the temperature sensitivity of the logits probability distribution during LLM autoregressive generation. Our approach embeds detectable yet semantically neutral implicit tokens via temperature-controlled sampling, and dynamically adjusts the temperature parameter to enhance watermark invariance against lexical substitution, syntactic restructuring, and other rewriting operations. Experiments demonstrate that our method maintains over 92% detection accuracy under diverse adversarial paraphrase attacks—significantly outperforming baseline approaches such as Aaron et al.—while preserving text quality and generation fluency. The primary contribution is the first systematic integration of temperature modulation into watermark design, achieving a unified balance among high robustness, low intrusiveness, and strong traceability.

Technology Category

Application Category

📝 Abstract
In the present-day scenario, Large Language Models (LLMs) are establishing their presence as powerful instruments permeating various sectors of society. While their utility offers valuable support to individuals, there are multiple concerns over potential misuse. Consequently, some academic endeavors have sought to introduce watermarking techniques, characterized by the inclusion of markers within machine-generated text, to facilitate algorithmic identification. This research project is focused on the development of a novel methodology for the detection of synthetic text, with the overarching goal of ensuring the ethical application of LLMs in AI-driven text generation. The investigation commences with replicating findings from a previous baseline study, thereby underscoring its susceptibility to variations in the underlying generation model. Subsequently, we propose an innovative watermarking approach and subject it to rigorous evaluation, employing paraphrased generated text to asses its robustness. Experimental results highlight the robustness of our proposal compared to the~cite{aarson} watermarking method.
Problem

Research questions and friction points this paper is trying to address.

Enhancing watermark robustness against paraphrasing attacks in LLMs
Detecting synthetic text to ensure ethical LLM application
Improving algorithmic identification of machine-generated content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel watermarking for synthetic text detection
Robust against paraphrasing attacks
Improves ethical LLM application
🔎 Similar Papers
No similar papers found.
B
Badr Youbi Idrissi
M
Monica Millunzi
A
Amelia Sorrenti
L
Lorenzo Baraldi
Daryna Dementieva
Daryna Dementieva
TUM
NLPNLP for Social GoodHarmful Textual InformationMultilingualismResponsible AI