🤖 AI Summary
Post-processing redundant noisy count tables under differential privacy suffers from high variance and computational intractability, especially for large-scale census-level data.
Method: We propose SEA BLUE—a scalable two-stage Best Linear Unbiased Estimator—that jointly optimizes estimation accuracy across aggregation and privatization stages by integrating matrix projection theory with structured noise modeling.
Contribution/Results: SEA BLUE achieves minimum-variance, unbiased, and self-consistent linear estimation for high-dimensional structured private data, attaining the Cramér–Rao lower bound theoretically. It constructs tight confidence intervals under three canonical statistical assumptions. Evaluated on 2010 U.S. Census demonstration data, SEA BLUE reduces memory and runtime overhead by over two orders of magnitude compared to standard projection methods while achieving theoretically optimal estimation variance. It demonstrates strong robustness and scalability, providing the first practical minimum-variance post-processing solution for large-scale differentially private tabular data.
📝 Abstract
In differential privacy (DP) mechanisms, it can be beneficial to release"redundant"outputs, in the sense that some quantities can be estimated in multiple ways by combining different combinations of privatized values. Indeed, the DP 2020 Decennial Census products published by the U.S. Census Bureau consist of such redundant noisy counts. When redundancy is present, the DP output can be improved by enforcing self-consistency (i.e., estimators obtained by combining different values result in the same estimate) and we show that the minimum variance processing is a linear projection. However, standard projection algorithms are too computationally expensive in terms of both memory and execution time for applications such as the Decennial Census. We propose the Scalable Efficient Algorithm for Best Linear Unbiased Estimate (SEA BLUE), based on a two step process of aggregation and differencing that 1) enforces self-consistency through a linear and unbiased procedure, 2) is computationally and memory efficient, 3) achieves the minimum variance solution under certain structural assumptions, and 4) is empirically shown to be robust to violations of these structural assumptions. We propose three methods of calculating confidence intervals from our estimates, under various assumptions. Finally, we apply SEA BLUE to two 2010 Census demonstration products, illustrating its scalability and validity.