Variance-sensitive Thompson sampling for generalised linear bandits, revisited

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

242K/year
🤖 AI Summary
This work addresses the limitation of conventional optimism-based approaches in stochastic generalized linear bandits, which struggle to yield variance-aware regret bounds. For the first time, the Gaussian Poincaré inequality is introduced into the analysis of Thompson sampling. By leveraging this functional inequality to control estimation errors after an initial warm-up phase, the method circumvents the need for optimistic confidence sets and establishes a regret upper bound that adapts to the reward variance. This result not only provides stronger theoretical support for the performance of Thompson sampling in generalized linear bandits but also opens a novel avenue for analyzing Bayesian bandit algorithms through functional inequalities.
📝 Abstract
We prove a variance-sensitive regret bound for Thompson sampling in stochastic generalised linear bandits. The argument assumes a warm-up, after which the regret is controlled through using the Gaussian Poincaré inequality. This bypasses the point at which previous optimism-based analyses break down. Removing the warm-up while retaining the same variance-sensitive scaling remains open, and appears nontrivial.
Problem

Research questions and friction points this paper is trying to address.

variance-sensitive
Thompson sampling
generalised linear bandits
regret bound
Innovation

Methods, ideas, or system contributions that make the work stand out.

variance-sensitive regret
Thompson sampling
generalised linear bandits
Gaussian Poincaré inequality
stochastic bandits
🔎 Similar Papers
No similar papers found.