Augmented Shuffle Differential Privacy Protocols for Large-Domain Categorical and Key-Value Data

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing enhanced shuffle differential privacy (DP) protocols struggle to efficiently handle large-domain categorical and key-value data due to prohibitive communication and computational overhead, as well as vulnerability to collusion and data poisoning attacks. To address these limitations, we propose FME—a novel protocol that extends enhanced shuffle DP to large-domain settings for the first time. FME integrates hash-based filtering (to prune infrequent items), multi-layer encrypted communication, randomized sampling, and virtual data injection, complemented by a bias-corrected estimator to improve accuracy. We formally prove that FME satisfies computational differential privacy and strong robustness against adversarial manipulation. Extensive experiments across twelve baseline methods demonstrate that FME significantly improves both estimation accuracy and efficiency in frequency estimation tasks. Moreover, FME enables scalable, high-accuracy publication of large-scale private data while preserving rigorous privacy guarantees.

Technology Category

Application Category

📝 Abstract

Shuffle DP (Differential Privacy) protocols provide high accuracy and privacy by introducing a shuffler who randomly shuffles data in a distributed system. However, most shuffle DP protocols are vulnerable to two attacks: collusion attacks by the data collector and users and data poisoning attacks. A recent study addresses this issue by introducing an augmented shuffle DP protocol, where users do not add noise and the shuffler performs random sampling and dummy data addition. However, it focuses on frequency estimation over categorical data with a small domain and cannot be applied to a large domain due to prohibitively high communication and computational costs. In this paper, we fill this gap by introducing a novel augmented shuffle DP protocol called the FME (Filtering-with-Multiple-Encryption) protocol. Our FME protocol uses a hash function to filter out unpopular items and then accurately calculates frequencies for popular items. To perform this within one round of interaction between users and the shuffler, our protocol carefully communicates within a system using multiple encryption. We also apply our FME protocol to more advanced KV (Key-Value) statistics estimation with an additional technique to reduce bias. For both categorical and KV data, we prove that our protocol provides computational DP, high robustness to the above two attacks, accuracy, and efficiency. We show the effectiveness of our proposals through comparisons with twelve existing protocols.

Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerabilities to collusion and data poisoning attacks in shuffle DP

Enables frequency estimation for large-domain categorical data efficiently

Extends protocol to key-value statistics estimation with bias reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

FME protocol uses hash filtering for large domains

Multiple encryption enables single-round user-shuffler interaction

Computational DP with robustness against collusion and poisoning

🔎 Similar Papers

Differentially Private Federated Learning: A Systematic Review