🤖 AI Summary
This work investigates the feasibility of jointly achieving post-hoc interpretability and differential privacy (DP) in NLP—addressing the critical question of whether they inherently conflict. We integrate DP mechanisms into text sanitization and systematically couple them with post-hoc explanation methods, including attention visualization and feature attribution. Through empirical analysis, we examine how task type, privacy budget (ε), and explanation granularity affect the joint privacy-interpretability performance. Results show that DP protection and model interpretability are not fundamentally incompatible; under appropriately calibrated noise budgets, explanation granularities, and task configurations, they can synergistically enhance one another. To our knowledge, this is the first study to propose a practical framework for integrating DP and post-hoc explanation in NLP, explicitly characterizing their coexistence conditions and providing reproducible configuration guidelines. The work advances the foundation for building trustworthy, transparent, and privacy-preserving NLP systems.
📝 Abstract
In the study of trustworthy Natural Language Processing (NLP), a number of important research fields have emerged, including that of extit{explainability} and extit{privacy}. While research interest in both explainable and privacy-preserving NLP has increased considerably in recent years, there remains a lack of investigation at the intersection of the two. This leaves a considerable gap in understanding of whether achieving extit{both} explainability and privacy is possible, or whether the two are at odds with each other. In this work, we conduct an empirical investigation into the privacy-explainability trade-off in the context of NLP, guided by the popular overarching methods of extit{Differential Privacy} (DP) and Post-hoc Explainability. Our findings include a view into the intricate relationship between privacy and explainability, which is formed by a number of factors, including the nature of the downstream task and choice of the text privatization and explainability method. In this, we highlight the potential for privacy and explainability to co-exist, and we summarize our findings in a collection of practical recommendations for future work at this important intersection.