🤖 AI Summary
Existing LLM fingerprinting methods exhibit weak semantic relevance and are vulnerable to Generation Revision Intervention (GRI) attacks, which can erase fingerprints and compromise model intellectual property protection. To address this, we propose the Implicit Fingerprinting (ImF) paradigm—the first to leverage semantically strong, implicitly paired fingerprints naturally embedded within question-answering behavior. ImF ensures behavioral consistency, perceptual indistinguishability, detection resistance, and erasure robustness. Our approach comprises four core components: formal modeling of GRI attacks, construction of semantically aligned question-answer pairs, implicit fingerprint injection, and a robust verification mechanism. Extensive experiments across multiple mainstream LLMs demonstrate that ImF achieves significantly higher fingerprint verification success rates than state-of-the-art baselines under diverse adversarial settings—including GRI, paraphrasing, and prompt engineering—while maintaining high robustness and practical deployability.
📝 Abstract
Training large language models (LLMs) is resource-intensive and expensive, making intellectual property (IP) protection essential. Most existing model fingerprint methods inject fingerprints into LLMs to protect model ownership. These methods create fingerprint pairs with weak semantic correlations, lacking the contextual coherence and semantic relatedness founded in normal question-answer (QA) pairs in LLMs. In this paper, we propose a Generation Revision Intervention (GRI) attack that can effectively exploit this flaw to erase fingerprints, highlighting the need for more secure model fingerprint methods. Thus, we propose a novel injected fingerprint paradigm called Implicit Fingerprints (ImF). ImF constructs fingerprint pairs with strong semantic correlations, disguising them as natural QA pairs within LLMs. This ensures the fingerprints are consistent with normal model behavior, making them indistinguishable and robust against detection and removal. Our experiment on multiple LLMs demonstrates that ImF retains high verification success rates under adversarial conditions, offering a reliable solution for protecting LLM ownership.