🤖 AI Summary
This paper studies the adaptive decision-making problem in infinite-armed rotting bandits, where the reward mean of each arm decays with its number of pulls. It considers two generalized rotting constraints: (i) cumulative rotting magnitude bounded by $V_T$ (“slow rotting”) and (ii) cumulative number of rotting events bounded by $S_T$ (“abrupt rotting”). We propose the first unified adaptive algorithm—Sliding-Window UCB—for infinite-armed rotting bandits. By leveraging a bias-variance trade-off analysis and a dynamic arm elimination mechanism, our algorithm achieves tight regret upper bounds of $O(sqrt{V_T log T})$ and $O(sqrt{S_T log T})$, respectively—breaking the restrictive assumptions of stationary rewards and finite arms prevalent in prior work. Extensive experiments demonstrate that our method consistently outperforms existing baselines across diverse rotting patterns.
📝 Abstract
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.