🤖 AI Summary
This study investigates user acceptance of differential privacy (DP) mechanisms for text privatization and its determinants. We conducted a mixed-methods global survey with 721 non-expert participants, employing factorial experiments and statistical modeling to analyze how contextual factors—including application scenario, data sensitivity, DP mechanism type, and data collection purpose—influence user preferences. Qualitative feedback further assessed users’ tolerance for utility loss in sanitized text, particularly regarding semantic coherence and practical usability. Our key finding is that users are significantly more sensitive to output utility degradation than current DP budgeting practices assume; we thus introduce the “acceptable utility lower bound” as a human-centered evaluation benchmark. This work fills a critical gap in privacy-preserving NLP by providing the first large-scale, empirically grounded, user-centric analysis. It offers both theoretical foundations and actionable design guidelines for developing DP-NLP systems aligned with real-world user expectations and requirements.
📝 Abstract
Recent literature has seen a considerable uptick in $ extit{Differentially Private Natural Language Processing}$ (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information $ extit{and}$ maintain original semantics. Despite continued work to address the open challenges in DP text privatization, there remains a scarcity of work addressing user perceptions of this technology, a crucial aspect which serves as the final barrier to practical adoption. In this work, we conduct a survey study with 721 laypersons around the globe, investigating how the factors of $ extit{scenario}$, $ extit{data sensitivity}$, $ extit{mechanism type}$, and $ extit{reason for data collection}$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts. Our findings highlight the socio-technical factors that must be considered in the study of DP NLP, opening the door to further user-based investigations going forward.