Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the stochastic multi-armed bandit problem under differential privacy, aiming to unify the exploration mechanisms of Thompson sampling (TS) and upper confidence bound (UCB) while establishing a theoretical connection to Gaussian differential privacy (GDP). To this end, we propose DP-TS-UCB—a parameterized algorithm that continuously controls the privacy–utility trade-off via a tunable parameter α. It is the first framework to jointly integrate Gaussian-prior TS, Gaussian noise injection, and the GDP analytical framework. Leveraging Gaussian anti-concentration bounds, we derive an Õ(T^{0.25(1−α)})-GDP privacy guarantee and an O(K ln^{α+1}(T)/Δ) regret upper bound. Our work reveals an intrinsic consistency between TS and UCB under privacy constraints and establishes a new paradigm for privacy-preserving sequential decision-making—one that offers both rigorous theoretical guarantees and practical flexibility.

Technology Category

Application Category

📝 Abstract
We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ ilde{O} left(T^{0.25(1-alpha)} ight)$-GDP and enjoys an $O left(Kln^{alpha+1}(T)/Delta ight)$ regret bound, where $alpha in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.
Problem

Research questions and friction points this paper is trying to address.

Exploring connections between Thompson Sampling and UCB for privacy
Developing DP-TS-UCB to balance privacy and regret trade-offs
Analyzing Gaussian mechanisms for differential privacy in bandit problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Thompson Sampling and UCB for privacy-regret trade-off
Uses Gaussian mechanisms for differential privacy guarantees
Links exploration mechanisms in Thompson Sampling and UCB
Bingshan Hu
Bingshan Hu
Data Science Institute at University of British Columbia
Machine learning theorydifferential privacywireless communications
Zhiming Huang
Zhiming Huang
University of Victoria
Bandit OptimizationGame TheoryComputer NetworkWireless Communication
T
Tianyue H. Zhang
Université de Montréal, Canada; Mila – Quebec AI Institute, Canada
M
Mathias L'ecuyer
Department of Computer Science, University of British Columbia, Canada
N
Nidhi Hegde
Department of Computing Science, University of Alberta, Canada; Alberta Machine Intelligence Institute (Amii), Canada