First Worst-Case Regret Bounds for Combinatorial Thompson Sampling in Sleeping Semi-Bandits

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This work addresses the long-standing theoretical gaps in Combinatorial Thompson Sampling (CTS) for sleeping bandits, including the absence of worst-case regret bounds, lack of guarantees under adversarial arm availability, and suboptimal performance with Gaussian priors. The paper establishes the first matching upper and lower regret bounds for CTS with Gaussian priors (CTS-G), proving a regret of $\tilde{O}(m\sqrt{NT})$. It further introduces a novel algorithm, CL-SG, which coordinates exploration across arms by sharing Gaussian random seeds while respecting combinatorial constraints and time-varying arm availability. CL-SG improves the regret bound to $\tilde{O}(\sqrt{mNT})$. Empirical evaluations on real-world datasets demonstrate that CL-SG significantly outperforms baseline methods such as CTS-G and CTS-B, offering both stronger theoretical guarantees and superior practical performance.

📝 Abstract

We revisit combinatorial Thompson sampling (CTS) for semi-bandits with sleeping arms, where arm availability varies over time and actions must satisfy combinatorial constraints, as in wireless mesh routing with fluctuating link availability. Despite its practical relevance, CTS has been hindered by several long-standing problems: (i) the absence of worst-case regret guarantees in the semi-bandit setting even without sleeping arms, (ii) the lack of theory under adversarially varying availability, and (iii) the consistently weak empirical performance of CTS with Gaussian priors (CTS-G). This paper resolves these long-standing issues by providing the first worst-case regret analysis of CTS-G, proving an upper bound of $\tilde{O}(m\sqrt{NT})$ and a matching lower bound of $\tildeΩ(m\sqrt{NT})$. To bridge the gap between theory and practice, we further propose CL-SG, a simple CTS-G variant that samples a single shared Gaussian seed each round to coordinate exploration across arms. We show that CL-SG achieves an improved regret bound of $\tilde{O}(\sqrt{mNT})$, together with a matching lower bound $Ω(\sqrt{mNT})$. Experiments on real-world datasets demonstrate that CL-SG consistently outperforms strong baselines including CTS-G and CTS-B, and we open-source our implementation for reproducibility.

Problem

Research questions and friction points this paper is trying to address.

Combinatorial Thompson Sampling

Sleeping Semi-Bandits

Worst-Case Regret

Adversarial Availability

Gaussian Priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combinatorial Thompson Sampling

Sleeping Semi-Bandits

Worst-Case Regret Bounds