Optimal Regret of Bernoulli Bandits under Global Differential Privacy

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses a significant constant-factor gap in the regret lower bound for Bernoulli multi-armed bandits under ε-global differential privacy. Method: We introduce a novel information-theoretic measure to quantify privacy difficulty; design DP-KLUCB and DP-IMED—first algorithms proving that discarding historical rewards is provably suboptimal under DP constraints; and derive the first DP-Chernoff-type concentration inequality coupling Laplace noise with Bernoulli observations. Contributions/Results: We establish a tight asymptotically optimal regret lower bound; both proposed algorithms achieve regret matching this bound with constant factors arbitrarily close to 1; our results strictly improve all existing lower bounds for every ε > 0, achieving— for the first time—the exact constant-level coincidence between upper and lower bounds.

Technology Category

Application Category

📝 Abstract

As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under $epsilon$-global Differential Privacy (DP) has been widely studied. Unlike bandits without DP, there is a significant gap between the best-known regret lower and upper bound in this setting, though they"match"in order. Thus, we revisit the regret lower and upper bounds of $epsilon$-global DP algorithms for Bernoulli bandits and improve both. First, we prove a tighter regret lower bound involving a novel information-theoretic quantity characterising the hardness of $epsilon$-global DP in stochastic bandits. Our lower bound strictly improves on the existing ones across all $epsilon$ values. Then, we choose two asymptotically optimal bandit algorithms, i.e. DP-KLUCB and DP-IMED, and propose their DP versions using a unified blueprint, i.e., (a) running in arm-dependent phases, and (b) adding Laplace noise to achieve privacy. For Bernoulli bandits, we analyse the regrets of these algorithms and show that their regrets asymptotically match our lower bound up to a constant arbitrary close to 1. This refutes the conjecture that forgetting past rewards is necessary to design optimal bandit algorithms under global DP. At the core of our algorithms lies a new concentration inequality for sums of Bernoulli variables under Laplace mechanism, which is a new DP version of the Chernoff bound. This result is universally useful as the DP literature commonly treats the concentrations of Laplace noise and random variables separately, while we couple them to yield a tighter bound.

Problem

Research questions and friction points this paper is trying to address.

Minimizing regret in Bernoulli bandits under global differential privacy

Bridging the gap between lower and upper regret bounds for DP bandits

Designing optimal DP bandit algorithms without forgetting past rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tighter regret lower bound for DP bandits

DP-KLUCB and DP-IMED with unified phases

New DP Chernoff bound for Bernoulli variables

🔎 Similar Papers

No similar papers found.

Authors to Follow