Balls-and-Bins Sampling for DP-SGD

📅 2024-12-21

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

206K/year

🤖 AI Summary

This paper addresses the misalignment between sampling mechanisms and the privacy-utility trade-off in DP-SGD. We observe that while shuffling is widely adopted in practice, it underestimates privacy cost; conversely, Poisson sampling provides rigorous privacy amplification but degrades utility. To bridge this gap, we propose Balls-and-Bins sampling—a randomized sampling mechanism grounded in load-balancing modeling—that achieves Rényi Differential Privacy (RDP) amplification comparable to Poisson sampling, while maintaining computational complexity close to shuffling. Our method is the first to simultaneously achieve “dual optimality”: privacy guarantees equivalent to Poisson sampling and training utility approaching that of shuffling—thereby closing a critical gap where privacy budgets are systematically underestimated in real-world deployments. Experiments demonstrate that, under identical noise scales, our approach attains significantly higher accuracy than Poisson sampling on CIFAR-10/100, while reducing privacy budget consumption by over 30% compared to shuffling—setting a new state-of-the-art in the privacy-utility trade-off.

Technology Category

Application Category

📝 Abstract

We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. In this work we show that the Balls-and-Bins sampling achieves the"best-of-both"samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes.

Problem

Research questions and friction points this paper is trying to address.

Introducing Balls-and-Bins sampling for DP-SGD optimization

Comparing privacy costs of shuffling vs. Poisson subsampling in DP-SGD

Achieving better privacy amplification with Balls-and-Bins sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Balls-and-Bins sampling for DP-SGD optimization

Combines shuffling-like implementation with privacy benefits

Matches utility of shuffling with better privacy amplification

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation