🤖 AI Summary
This work addresses the challenge of efficiently outputting a representative subset of items in differential privacy domain discovery, where users hold subsets drawn from an unknown shared domain. The authors propose the Weighted Gaussian Mechanism (WGM) as a core method, establishing for the first time a near-optimal ℓ₁ error bound under Zipf-distributed data and providing a distribution-agnostic ℓ∞ guarantee. Building on these theoretical foundations, WGM is leveraged as a universal preprocessing module to extend existing algorithms—originally designed for known domains—for tasks including set union, top-k selection, and k-hitting set into the unknown-domain setting. Theoretical analysis demonstrates that WGM achieves near-optimal performance across multiple regimes, and empirical evaluations confirm its superiority or competitiveness against current baselines across all three tasks.
📝 Abstract
We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal $\ell_1$ missing mass guarantee on Zipfian data as well as a distribution-free $\ell_\infty$ missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top-$k$ and $k$-hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.