🤖 AI Summary
This paper identifies a “contextual vulnerability” in word-level differential privacy (DP) text sanitization: large language models (LLMs) can infer original semantics from residual contextual cues in sanitized text, thereby undermining privacy guarantees. To address this, the paper proposes the first LLM-based adversarial reconstruction post-processing method that simultaneously enhances both privacy preservation and textual utility while strictly maintaining formal DP guarantees. Through systematic evaluation of multiple DP mechanisms across varying privacy budgets, the study empirically demonstrates that LLMs exhibit a dual role—acting both as privacy adversaries capable of reconstructing sensitive semantics and as quality enhancers that improve coherence and semantic fidelity of sanitized text. This work provides the first systematic characterization of the “double-edged sword” effect of LLM-driven data reconstruction in DP text sanitization and establishes a novel paradigm for achieving balanced privacy-utility trade-offs in differentially private natural language processing.
📝 Abstract
Differentially private text sanitization refers to the process of privatizing texts under the framework of Differential Privacy (DP), providing provable privacy guarantees while also empirically defending against adversaries seeking to harm privacy. Despite their simplicity, DP text sanitization methods operating at the word level exhibit a number of shortcomings, among them the tendency to leave contextual clues from the original texts due to randomization during sanitization $unicode{x2013}$ this we refer to as $ extit{contextual vulnerability}$. Given the powerful contextual understanding and inference capabilities of Large Language Models (LLMs), we explore to what extent LLMs can be leveraged to exploit the contextual vulnerability of DP-sanitized texts. We expand on previous work not only in the use of advanced LLMs, but also in testing a broader range of sanitization mechanisms at various privacy levels. Our experiments uncover a double-edged sword effect of LLM-based data reconstruction attacks on privacy and utility: while LLMs can indeed infer original semantics and sometimes degrade empirical privacy protections, they can also be used for good, to improve the quality and privacy of DP-sanitized texts. Based on our findings, we propose recommendations for using LLM data reconstruction as a post-processing step, serving to increase privacy protection by thinking adversarially.