π€ AI Summary
Existing enhanced shuffle differential privacy (DP) protocols struggle to efficiently handle large-domain categorical and key-value data due to prohibitive communication and computational overhead, as well as vulnerability to collusion and data poisoning attacks. To address these limitations, we propose FMEβa novel protocol that extends enhanced shuffle DP to large-domain settings for the first time. FME integrates hash-based filtering (to prune infrequent items), multi-layer encrypted communication, randomized sampling, and virtual data injection, complemented by a bias-corrected estimator to improve accuracy. We formally prove that FME satisfies computational differential privacy and strong robustness against adversarial manipulation. Extensive experiments across twelve baseline methods demonstrate that FME significantly improves both estimation accuracy and efficiency in frequency estimation tasks. Moreover, FME enables scalable, high-accuracy publication of large-scale private data while preserving rigorous privacy guarantees.
π Abstract
Shuffle DP (Differential Privacy) protocols provide high accuracy and privacy by introducing a shuffler who randomly shuffles data in a distributed system. However, most shuffle DP protocols are vulnerable to two attacks: collusion attacks by the data collector and users and data poisoning attacks. A recent study addresses this issue by introducing an augmented shuffle DP protocol, where users do not add noise and the shuffler performs random sampling and dummy data addition. However, it focuses on frequency estimation over categorical data with a small domain and cannot be applied to a large domain due to prohibitively high communication and computational costs.
In this paper, we fill this gap by introducing a novel augmented shuffle DP protocol called the FME (Filtering-with-Multiple-Encryption) protocol. Our FME protocol uses a hash function to filter out unpopular items and then accurately calculates frequencies for popular items. To perform this within one round of interaction between users and the shuffler, our protocol carefully communicates within a system using multiple encryption. We also apply our FME protocol to more advanced KV (Key-Value) statistics estimation with an additional technique to reduce bias. For both categorical and KV data, we prove that our protocol provides computational DP, high robustness to the above two attacks, accuracy, and efficiency. We show the effectiveness of our proposals through comparisons with twelve existing protocols.