🤖 AI Summary
This paper addresses the challenge of differentially private (DP) statistical estimation for unbounded-support data. We propose the first generic DP estimation framework based on systematic data truncation, applicable to high-dimensional exponential families—including Gaussian mean and covariance estimation. Unlike conventional approaches relying on problem-specific sensitivity analysis, our method systematically incorporates truncation statistics into DP estimation, augmented with a bias-correction mechanism and an improved uniform convergence bound for the truncated log-likelihood, effectively mitigating truncation-induced bias. The algorithm integrates data truncation, maximum likelihood estimation, and DP stochastic gradient descent, achieving computational efficiency alongside near-optimal sample complexity. Experiments demonstrate state-of-the-art privacy–accuracy trade-offs for Gaussian mean and covariance estimation. Our framework establishes a scalable new paradigm for DP statistical modeling over unbounded data.
📝 Abstract
We introduce a novel framework for differentially private (DP) statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded. Traditional approaches rely on problem-specific sensitivity analysis, limiting their applicability. By leveraging techniques from truncated statistics, we develop computationally efficient DP estimators for exponential family distributions, including Gaussian mean and covariance estimation, achieving near-optimal sample complexity. Previous works on exponential families only consider bounded or one-dimensional families. Our approach mitigates sensitivity through truncation while carefully correcting for the introduced bias using maximum likelihood estimation and DP stochastic gradient descent. Along the way, we establish improved uniform convergence guarantees for the log-likelihood function of exponential families, which may be of independent interest. Our results provide a general blueprint for DP algorithm design via truncated statistics.