๐ค AI Summary
This work addresses the long-standing theoretical gaps in Combinatorial Thompson Sampling (CTS) for sleeping bandits, including the absence of worst-case regret bounds, lack of guarantees under adversarial arm availability, and suboptimal performance with Gaussian priors. The paper establishes the first matching upper and lower regret bounds for CTS with Gaussian priors (CTS-G), proving a regret of $\tilde{O}(m\sqrt{NT})$. It further introduces a novel algorithm, CL-SG, which coordinates exploration across arms by sharing Gaussian random seeds while respecting combinatorial constraints and time-varying arm availability. CL-SG improves the regret bound to $\tilde{O}(\sqrt{mNT})$. Empirical evaluations on real-world datasets demonstrate that CL-SG significantly outperforms baseline methods such as CTS-G and CTS-B, offering both stronger theoretical guarantees and superior practical performance.
๐ Abstract
We revisit combinatorial Thompson sampling (CTS) for semi-bandits with sleeping arms, where arm availability varies over time and actions must satisfy combinatorial constraints, as in wireless mesh routing with fluctuating link availability. Despite its practical relevance, CTS has been hindered by several long-standing problems: (i) the absence of worst-case regret guarantees in the semi-bandit setting even without sleeping arms, (ii) the lack of theory under adversarially varying availability, and (iii) the consistently weak empirical performance of CTS with Gaussian priors (CTS-G). This paper resolves these long-standing issues by providing the first worst-case regret analysis of CTS-G, proving an upper bound of $\tilde{O}(m\sqrt{NT})$ and a matching lower bound of $\tildeฮฉ(m\sqrt{NT})$. To bridge the gap between theory and practice, we further propose CL-SG, a simple CTS-G variant that samples a single shared Gaussian seed each round to coordinate exploration across arms. We show that CL-SG achieves an improved regret bound of $\tilde{O}(\sqrt{mNT})$, together with a matching lower bound $ฮฉ(\sqrt{mNT})$. Experiments on real-world datasets demonstrate that CL-SG consistently outperforms strong baselines including CTS-G and CTS-B, and we open-source our implementation for reproducibility.