Robust LLM Fingerprinting via Domain-Specific Watermarks

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of reliably identifying the provenance of open-source large language models (LLMs), this paper proposes a domain-specific watermarking mechanism. Unlike conventional backdoor-based fingerprints, our method embeds lightweight, verifiable, and conditional watermarks within designated subdomains (e.g., linguistic or topical subsets), establishing the first domain-adaptive watermarking paradigm. This design effectively mitigates the failure of generic watermarks under realistic perturbations—including fine-tuning, quantization, and API-based distillation. Our approach integrates controllable text generation fine-tuning, subdomain-conditioned embedding, and a statistical hypothesis testing framework for watermark verification. Under multiple rounds of fine-tuning and other practical adversarial transformations, it achieves >99% detection accuracy with a false positive rate <1e−4, while preserving original generation quality—evidenced by unchanged BLEU scores and LLM-as-a-Judge evaluations.

Technology Category

Application Category

📝 Abstract
As open-source language models (OSMs) grow more capable and are widely shared and finetuned, ensuring model provenance, i.e., identifying the origin of a given model instance, has become an increasingly important issue. At the same time, existing backdoor-based model fingerprinting techniques often fall short of achieving key requirements of real-world model ownership detection. In this work, we build on the observation that while current open-source model watermarks fail to achieve reliable content traceability, they can be effectively adapted to address the challenge of model provenance. To this end, we introduce the concept of domain-specific watermarking for model fingerprinting. Rather than watermarking all generated content, we train the model to embed watermarks only within specified subdomains (e.g., particular languages or topics). This targeted approach ensures detection reliability, while improving watermark durability and quality under a range of real-world deployment settings. Our evaluations show that domain-specific watermarking enables model fingerprinting with strong statistical guarantees, controllable false positive rates, high detection power, and preserved generation quality. Moreover, we find that our fingerprints are inherently stealthy and naturally robust to real-world variability across deployment scenarios.
Problem

Research questions and friction points this paper is trying to address.

Identifying origin of widely shared finetuned OSMs
Improving unreliable backdoor-based fingerprinting techniques
Ensuring watermark durability and detection reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific watermarking for model fingerprinting
Watermarks embedded only in specified subdomains
Strong statistical guarantees and high detection power
🔎 Similar Papers
No similar papers found.