Thompson Sampling-like Algorithms for Stochastic Rising Bandits

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This paper studies the Stochastic Rising-Rest Bandit (SRRB), a non-stationary multi-armed bandit setting where the expected reward of each arm monotonically increases with its historical pull count. The work addresses theoretical and algorithmic gaps for Thompson Sampling (TS)-based methods in such environments. We propose and analyze two TS variants—adaptive and sliding-window TS—featuring novel mechanisms for Bayesian posterior updating and local reward estimation. We derive tight sublinear regret upper bounds and introduce a new complexity index to characterize environmental regularity, yielding a matching lower bound. Theoretically, under mild conditions, our TS variants achieve strictly better regret than existing UCB-based approaches. Extensive experiments on synthetic and real-world datasets corroborate these theoretical advantages, demonstrating significant empirical improvements.

Technology Category

Application Category

📝 Abstract

Stochastic rising rested bandit (SRRB) is a setting where the arms' expected rewards increase as they are pulled. It models scenarios in which the performances of the different options grow as an effect of an underlying learning process (e.g., online model selection). Even if the bandit literature provides specifically crafted algorithms based on upper-confidence bounds for such a setting, no study about Thompson sampling TS-like algorithms has been performed so far. The strong regularity of the expected rewards in the SRRB setting suggests that specific instances may be tackled effectively using adapted and sliding-window TS approaches. This work provides novel regret analyses for such algorithms in SRRBs, highlighting the challenges and providing new technical tools of independent interest. Our results allow us to identify under which assumptions TS-like algorithms succeed in achieving sublinear regret and which properties of the environment govern the complexity of the regret minimization problem when approached with TS. Furthermore, we provide a regret lower bound based on a complexity index we introduce. Finally, we conduct numerical simulations comparing TS-like algorithms with state-of-the-art approaches for SRRBs in synthetic and real-world settings.

Problem

Research questions and friction points this paper is trying to address.

Studying Thompson Sampling algorithms for stochastic rising bandits

Analyzing regret performance in stochastic rising rested bandits

Comparing TS-like algorithms with state-of-the-art SRRB approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted Thompson Sampling for rising rewards

Sliding-window approach for dynamic bandits

Novel regret analysis with complexity index

🔎 Similar Papers

Nonstationary Bandit Learning via Predictive Sampling