CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy data extraction risks posed by soft prompts in large language models (LLMs), this paper proposes CoSPED—a consistency-driven framework that unifies soft prompt optimization and defense verification for the first time. Methodologically, it integrates dynamic, additive, and public losses, augments extraction stability via self-consistent decoding, and employs rank-one model editing to achieve lightweight, verifiable defense. Evaluated across diverse models (e.g., Pythia) and scenarios, CoSPED achieves a 65.2% extraction rate under 50-token prefixes (51.7% on Pythia), which drops to 1.6% post-defense—substantially outperforming prior approaches. Its core contribution is the first consistency-based soft prompt evaluation framework that simultaneously ensures efficient testing and formally verifiable defense.

Technology Category

Application Category

📝 Abstract
Large language models have gained widespread attention recently, but their potential security vulnerabilities, especially privacy leakage, are also becoming apparent. To test and evaluate for data extraction risks in LLM, we proposed CoSPED, short for Consistent Soft Prompt targeted data Extraction and Defense. We introduce several innovative components, including Dynamic Loss, Additive Loss, Common Loss, and Self Consistency Decoding Strategy, and tested to enhance the consistency of the soft prompt tuning process. Through extensive experimentation with various combinations, we achieved an extraction rate of 65.2% at a 50-token prefix comparison. Our comparisons of CoSPED with other reference works confirm our superior extraction rates. We evaluate CoSPED on more scenarios, achieving Pythia model extraction rate of 51.7% and introducing cross-model comparison. Finally, we explore defense through Rank-One Model Editing and achieve a reduction in the extraction rate to 1.6%, which proves that our analysis of extraction mechanisms can directly inform effective mitigation strategies against soft prompt-based attacks.
Problem

Research questions and friction points this paper is trying to address.

Addressing privacy leakage vulnerabilities in large language models
Developing methods to test data extraction risks in LLMs
Creating defense strategies against soft prompt-based attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Loss enhances soft prompt tuning consistency
Additive Loss improves targeted data extraction performance
Rank-One Model Editing reduces extraction rate significantly
🔎 Similar Papers
No similar papers found.
Y
Yang Zhuochen
Cybersecurity Strategic Technology Centre, ST Engineering, Singapore, Singapore
F
Fok Kar Wai
Cybersecurity Strategic Technology Centre, ST Engineering, Singapore, Singapore
Vrizlynn Thing
Vrizlynn Thing
Unknown affiliation
CybersecurityDigital ForensicsArtificial IntelligenceSecurity AnalyticsCommunications