🤖 AI Summary
This work addresses privacy-preserving learning in generalized linear contextual bandits under both stochastic and adversarial context settings, proposing novel algorithms that satisfy shuffle differential privacy (SDP) and joint differential privacy (JDP), respectively. To handle the lack of closed-form estimators, the authors employ private convex optimization and explicitly incorporate the resulting optimization error into the regret analysis. Key contributions include the first SDP and JDP guarantees for generalized linear contextual bandits, removal of dependence on the instance-specific parameter κ in the regret bounds, and elimination of spectral assumptions on the context distribution—requiring only ℓ₂-boundedness. The resulting regret bounds are Õ(d^{3/2}√T/√ε) in the stochastic setting and Õ(d√T/√ε) in the adversarial setting, with the latter matching the non-private optimal rate up to a 1/√ε factor.
📝 Abstract
We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models -- which admit closed-form estimators -- generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving $\tilde{O}(d^{3/2}\sqrt{T}/\sqrt{\varepsilon})$ regret. For adversarial contexts, we provide a joint-DP algorithm with $\tilde{O}(d\sqrt{T}/\sqrt{\varepsilon})$ regret -- matching the non-private rate up to a $1/\sqrt{\varepsilon}$ factor. Both algorithms remove dependence on the instance-specific parameter $\kappa$ (which can be exponential in dimension) from the dominant $\sqrt{T}$ term. Unlike prior work on locally private GLM bandits, our methods require no spectral assumptions on the context distribution beyond $\ell_2$ boundedness.