🤖 AI Summary
This paper addresses the problem of privately representing size-$k$ subsets from a large universe under differential privacy, aiming to minimize storage overhead while ensuring accurate decoding. The authors propose a novel construction paradigm based on random linear coding—departing from conventional noise-addition approaches—and introduce the first method that embeds sets into a random linear system for privacy-preserving representation. They design two mechanisms: (1) an $(varepsilon,delta)$-DP scheme achieving space complexity $approx 1.05kvarepsilon$ and error probability matching the tight lower bound $1/(e^varepsilon+1)$; and (2) a pure $varepsilon$-DP variant with improved space efficiency but higher decoding cost. Theoretically, they establish the first space lower bound for this problem and prove their constructions are optimal up to constant factors, thereby providing a tight characterization of the privacy–utility trade-off.
📝 Abstract
We study the problem of differentially private (DP) mechanisms for representing sets of size $k$ from a large universe. Our first construction creates $(epsilon,delta)$-DP representations with error probability of $1/(e^epsilon + 1)$ using space at most $1.05 k epsilon cdot log(e)$ bits where the time to construct a representation is $O(k log(1/delta))$ while decoding time is $O(log(1/delta))$. We also present a second algorithm for pure $epsilon$-DP representations with the same error using space at most $k epsilon cdot log(e)$ bits, but requiring large decoding times. Our algorithms match our lower bounds on privacy-utility trade-offs (including constants but ignoring $delta$ factors) and we also present a new space lower bound matching our constructions up to small constant factors. To obtain our results, we design a new approach embedding sets into random linear systems deviating from most prior approaches that inject noise into non-private solutions.