π€ AI Summary
To address the low identification accuracy of infrequent items and high communication/computation overhead in frequent item statistics under Local Differential Privacy (LDP), this paper proposes the first succinct histogram protocol for (Ξ΅,Ξ΄)-LDP that integrates coding enhancement. Its core innovation lies in incorporating polar codes with successive cancellation list (SCL) decoding into LDP frequency estimation, coupled with a Gaussian perturbation mechanism to enable soft decoding and noise-robust frequency reconstruction. Unlike conventional LDP protocols, the proposed method significantly improves detection accuracy for infrequent items while reducing communication bandwidth and client-side computational loadβall without compromising rigorous privacy guarantees. Experiments on multiple real-world datasets demonstrate that our approach reduces frequency estimation error by 23%β41% compared to state-of-the-art methods, making it particularly suitable for privacy-sensitive large-scale machine learning applications.
π Abstract
A succinct histogram captures frequent items and their frequencies across clients and has become increasingly important for large-scale, privacy-sensitive machine learning applications. To develop a rigorous framework to guarantee privacy for the succinct histogram problem, local differential privacy (LDP) has been utilized and shown promising results. To preserve data utility under LDP, which essentially works by intentionally adding noise to data, error-correcting codes naturally emerge as a promising tool for reliable information collection. This work presents the first practical $(Ξ΅,Ξ΄)$-LDP protocol for constructing succinct histograms using error-correcting codes. To this end, polar codes and their successive-cancellation list (SCL) decoding algorithms are leveraged as the underlying coding scheme. More specifically, our protocol introduces Gaussian-based perturbations to enable efficient soft decoding. Experiments demonstrate that our approach outperforms prior methods, particularly for items with low true frequencies, while maintaining similar frequency estimation accuracy.