🤖 AI Summary
Differentially private release of sparse vectors—such as social adjacency matrices, recommendation interaction matrices, and SNP data—is hindered by traditional randomized response mechanisms, whose communication cost scales as Ω(nN), rendering them infeasible for large-scale settings.
Method: We propose the first efficient ε-differentially private algorithm whose communication complexity *decreases* with shrinking privacy budget ε, achieving O(εm) communication cost—surpassing the non-private lower bound Ω(m log n). Our approach integrates a refined randomized response mechanism with sparse-structure-aware encoding and lightweight aggregation.
Contribution/Results: Under strict ε-differential privacy, our method simultaneously reduces both communication and computational overhead. Theoretical analysis and extensive experiments confirm that it matches the accuracy of classical baselines while drastically cutting communication—especially beneficial for massive sparse data. This represents the first scheme to break the non-private communication barrier in differentially private sparse vector release.
📝 Abstract
In this work, we propose a differentially private algorithm for publishing matrices aggregated from sparse vectors. These matrices include social network adjacency matrices, user-item interaction matrices in recommendation systems, and single nucleotide polymorphisms (SNPs) in DNA data. Traditionally, differential privacy in vector collection relies on randomized response, but this approach incurs high communication costs. Specifically, for a matrix with $N$ users, $n$ columns, and $m$ nonzero elements, conventional methods require $Ω(n imes N)$ communication, making them impractical for large-scale data. Our algorithm significantly reduces this cost to $O(varepsilon m)$, where $varepsilon$ is the privacy budget. Notably, this is even lower than the non-private case, which requires $Ω(m log n)$ communication. Moreover, as the privacy budget decreases, communication cost further reduces, enabling better privacy with improved efficiency. We theoretically prove that our method yields results identical to those of randomized response, and experimental evaluations confirm its effectiveness in terms of accuracy, communication efficiency, and computational complexity.