MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models

📅 2024-10-11

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address the intellectual property protection challenge wherein black-box ownership verification fails for large language models (LLMs) under model merging attacks, this paper proposes the first merge-resistant fingerprinting mechanism. Our method enhances fingerprint robustness against merging via gradient-driven pre-optimization of fingerprint inputs, pseudo-merging modeling, and adversarial training. We further design a black-box response-matching detection algorithm that performs attribution verification solely from input-output pairs. Experiments demonstrate >96% fingerprint detection accuracy across diverse mainstream merging strategies—including linear, task-arithmetic, and TIES-merging—while incurring <0.5% degradation in downstream task performance, significantly outperforming existing approaches. To our knowledge, this is the first work enabling robust black-box ownership verification resilient to model merging, establishing a deployable new paradigm for LLM intellectual property protection.

Technology Category

Application Category

📝 Abstract

Protecting the intellectual property of Large Language Models (LLMs) has become increasingly critical due to the high cost of training. Model merging, which integrates multiple expert models into a single multi-task model, introduces a novel risk of unauthorized use of LLMs due to its efficient merging process. While fingerprinting techniques have been proposed for verifying model ownership, their resistance to model merging remains unexplored. To address this gap, we propose a novel fingerprinting method, MergePrint, which embeds robust fingerprints capable of surviving model merging. MergePrint enables black-box ownership verification, where owners only need to check if a model produces target outputs for specific fingerprint inputs, without accessing model weights or intermediate outputs. By optimizing against a pseudo-merged model that simulates merged behavior, MergePrint ensures fingerprints that remain detectable after merging. Additionally, to minimize performance degradation, we pre-optimize the fingerprint inputs. MergePrint pioneers a practical solution for black-box ownership verification, protecting LLMs from misappropriation via merging, while also excelling in resistance to broader model theft threats.

Problem

Research questions and friction points this paper is trying to address.

Protecting LLMs from unauthorized merging

Ensuring robust fingerprints post-model merging

Enabling black-box ownership verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust fingerprints for LLMs

Black-box ownership verification

Resistance to model merging

🔎 Similar Papers

A Fingerprint for Large Language Models