PRIVMARK: Private Large Language Models Watermarking with MPC

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing watermarking techniques for large language model (LLM) content provenance pose significant privacy risks, as they typically require access to model parameters or training data. Method: We propose PRIVMARK—the first privacy-preserving watermarking framework for LLMs based on secure multi-party computation (MPC). PRIVMARK enables multiple parties to collaboratively generate watermarks in a black-box setting without exposing model weights, leveraging SecretFlow-SPU and ABY3 to implement efficient cryptographic protocols tailored for the PostMark watermarking method. Contribution/Results: Extensive experiments demonstrate that PRIVMARK achieves performance parity with plaintext baselines in semantic coherence, robustness against paraphrasing, and resistance to watermark removal attacks, while providing strong privacy guarantees and maintaining watermark robustness. This work marks the first application of MPC to LLM watermarking, establishing a foundational approach for privacy-aware content attribution.

Technology Category

Application Category

📝 Abstract

The rapid growth of Large Language Models (LLMs) has highlighted the pressing need for reliable mechanisms to verify content ownership and ensure traceability. Watermarking offers a promising path forward, but it remains limited by privacy concerns in sensitive scenarios, as traditional approaches often require direct access to a model's parameters or its training data. In this work, we propose a secure multi-party computation (MPC)-based private LLMs watermarking framework, PRIVMARK, to address the concerns. Concretely, we investigate PostMark (EMNLP'2024), one of the state-of-the-art LLMs Watermarking methods, and formulate its basic operations. Then, we construct efficient protocols for these operations using the MPC primitives in a black-box manner. In this way, PRIVMARK enables multiple parties to collaboratively watermark an LLM's output without exposing the model's weights to any single computing party. We implement PRIVMARK using SecretFlow-SPU (USENIX ATC'2023) and evaluate its performance using the ABY3 (CCS'2018) backend. The experimental results show that PRIVMARK achieves semantically identical results compared to the plaintext baseline without MPC and is resistant against paraphrasing and removing attacks with reasonable efficiency.

Problem

Research questions and friction points this paper is trying to address.

Enables private watermarking of LLM outputs without exposing model parameters

Addresses privacy concerns in sensitive watermarking scenarios using MPC

Allows collaborative watermarking while protecting model weights from disclosure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses secure multi-party computation for watermarking

Enables collaborative watermarking without exposing model weights

Achieves identical results to plaintext baseline with MPC

🔎 Similar Papers

Publicly Detectable Watermarking for Language Models