Variance-sensitive Thompson sampling for generalised linear bandits, revisited

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This work addresses the limitation of conventional optimism-based approaches in stochastic generalized linear bandits, which struggle to yield variance-aware regret bounds. For the first time, the Gaussian Poincaré inequality is introduced into the analysis of Thompson sampling. By leveraging this functional inequality to control estimation errors after an initial warm-up phase, the method circumvents the need for optimistic confidence sets and establishes a regret upper bound that adapts to the reward variance. This result not only provides stronger theoretical support for the performance of Thompson sampling in generalized linear bandits but also opens a novel avenue for analyzing Bayesian bandit algorithms through functional inequalities.

📝 Abstract

We prove a variance-sensitive regret bound for Thompson sampling in stochastic generalised linear bandits. The argument assumes a warm-up, after which the regret is controlled through using the Gaussian Poincaré inequality. This bypasses the point at which previous optimism-based analyses break down. Removing the warm-up while retaining the same variance-sensitive scaling remains open, and appears nontrivial.

Problem

Research questions and friction points this paper is trying to address.

variance-sensitive

Thompson sampling

generalised linear bandits

regret bound

Innovation

Methods, ideas, or system contributions that make the work stand out.

variance-sensitive regret

Thompson sampling

generalised linear bandits

Gaussian Poincaré inequality