🤖 AI Summary
This work addresses the trade-off between privacy preservation and detection performance in machine-generated text detection. The authors propose DP-MGTD, a novel framework that enhances the distinguishability between human- and machine-generated texts by injecting differentially private noise. It introduces an adaptive entity sanitization algorithm that dynamically allocates privacy budgets through a two-stage mechanism: the Laplace mechanism for numerical entities and the Exponential mechanism for textual entities, thereby balancing utility and privacy. Experimental results on MGTBench-2.0 demonstrate that the proposed method achieves near-perfect detection accuracy while providing rigorous differential privacy guarantees, significantly outperforming non-private baselines.
📝 Abstract
The deployment of Machine-Generated Text (MGT) detection systems necessitates processing sensitive user data, creating a fundamental conflict between authorship verification and privacy preservation. Standard anonymization techniques often disrupt linguistic fluency, while rigorous Differential Privacy (DP) mechanisms typically degrade the statistical signals required for accurate detection. To resolve this dilemma, we propose \textbf{DP-MGTD}, a framework incorporating an Adaptive Differentially Private Entity Sanitization algorithm. Our approach utilizes a two-stage mechanism that performs noisy frequency estimation and dynamically calibrates privacy budgets, applying Laplace and Exponential mechanisms to numerical and textual entities respectively. Crucially, we identify a counter-intuitive phenomenon where the application of DP noise amplifies the distinguishability between human and machine text by exposing distinct sensitivity patterns to perturbation. Extensive experiments on the MGTBench-2.0 dataset show that our method achieves near-perfect detection accuracy, significantly outperforming non-private baselines while satisfying strict privacy guarantees.