On the Impact of Noise in Differentially Private Text Rewriting

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the fundamental trade-off between semantic fidelity and utility degradation in differential privacy (DP) text rewriting, where injected noise inevitably distorts meaning. We propose a sentence-level fill-in privatization framework and conduct the first controlled empirical study isolating noise as the primary cause of DP utility loss. Through systematic comparison of DP and non-DP text sanitization methods, we find that standard DP perturbations substantially degrade BLEU, BERTScore, and downstream task accuracy—whereas non-DP methods, though lacking formal privacy guarantees, achieve superior empirical privacy (e.g., robustness against membership inference attacks) and textual quality. Our study establishes a critical empirical benchmark for NLP privacy research, prompts methodological reflection on the practicality of strict DP in language applications, and introduces a novel, utility-aware privatization paradigm grounded in controllable, sentence-level noise injection.

Technology Category

Application Category

📝 Abstract

The field of text privatization often leverages the notion of $ extit{Differential Privacy}$ (DP) to provide formal guarantees in the rewriting or obfuscation of sensitive textual data. A common and nearly ubiquitous form of DP application necessitates the addition of calibrated noise to vector representations of text, either at the data- or model-level, which is governed by the privacy parameter $varepsilon$. However, noise addition almost undoubtedly leads to considerable utility loss, thereby highlighting one major drawback of DP in NLP. In this work, we introduce a new sentence infilling privatization technique, and we use this method to explore the effect of noise in DP text rewriting. We empirically demonstrate that non-DP privatization techniques excel in utility preservation and can find an acceptable empirical privacy-utility trade-off, yet cannot outperform DP methods in empirical privacy protections. Our results highlight the significant impact of noise in current DP rewriting mechanisms, leading to a discussion of the merits and challenges of DP in NLP, as well as the opportunities that non-DP methods present.

Problem

Research questions and friction points this paper is trying to address.

Differential Privacy

Text Privacy

Information Retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Privacy

Text Modification

Privacy-Preserving Sentences

🔎 Similar Papers

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization