🤖 AI Summary
Differential privacy typically requires bounded data, posing challenges for handling unbounded datasets. This work proposes Public-moment-guided Truncation (PMT), a novel method that leverages second-order moment information from a small amount of public data to adaptively transform and truncate private data. The truncation radius in PMT depends only on non-sensitive parameters such as dimensionality and sample size. This approach substantially improves the condition number of the data, enhances robustness to noise, and supports mapping back to the original space. Theoretical analysis demonstrates that PMT achieves tighter error bounds, better convergence, and stronger robustness compared to existing methods. Experiments on both synthetic and real-world datasets confirm that PMT significantly improves model accuracy and stability while preserving differential privacy.
📝 Abstract
Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.