🤖 AI Summary
This work addresses the fundamental trade-off between semantic fidelity and utility degradation in differential privacy (DP) text rewriting, where injected noise inevitably distorts meaning. We propose a sentence-level fill-in privatization framework and conduct the first controlled empirical study isolating noise as the primary cause of DP utility loss. Through systematic comparison of DP and non-DP text sanitization methods, we find that standard DP perturbations substantially degrade BLEU, BERTScore, and downstream task accuracy—whereas non-DP methods, though lacking formal privacy guarantees, achieve superior empirical privacy (e.g., robustness against membership inference attacks) and textual quality. Our study establishes a critical empirical benchmark for NLP privacy research, prompts methodological reflection on the practicality of strict DP in language applications, and introduces a novel, utility-aware privatization paradigm grounded in controllable, sentence-level noise injection.
📝 Abstract
The field of text privatization often leverages the notion of $ extit{Differential Privacy}$ (DP) to provide formal guarantees in the rewriting or obfuscation of sensitive textual data. A common and nearly ubiquitous form of DP application necessitates the addition of calibrated noise to vector representations of text, either at the data- or model-level, which is governed by the privacy parameter $varepsilon$. However, noise addition almost undoubtedly leads to considerable utility loss, thereby highlighting one major drawback of DP in NLP. In this work, we introduce a new sentence infilling privatization technique, and we use this method to explore the effect of noise in DP text rewriting. We empirically demonstrate that non-DP privatization techniques excel in utility preservation and can find an acceptable empirical privacy-utility trade-off, yet cannot outperform DP methods in empirical privacy protections. Our results highlight the significant impact of noise in current DP rewriting mechanisms, leading to a discussion of the merits and challenges of DP in NLP, as well as the opportunities that non-DP methods present.