An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

📅 2024-04-22

🏛️ Neural Information Processing Systems

📈 Citations: 2

✨ Influential: 1

career value

179K/year

🤖 AI Summary

This paper studies the adaptive decision-making problem in infinite-armed rotting bandits, where the reward mean of each arm decays with its number of pulls. It considers two generalized rotting constraints: (i) cumulative rotting magnitude bounded by $V_T$ (“slow rotting”) and (ii) cumulative number of rotting events bounded by $S_T$ (“abrupt rotting”). We propose the first unified adaptive algorithm—Sliding-Window UCB—for infinite-armed rotting bandits. By leveraging a bias-variance trade-off analysis and a dynamic arm elimination mechanism, our algorithm achieves tight regret upper bounds of $O(sqrt{V_T log T})$ and $O(sqrt{S_T log T})$, respectively—breaking the restrictive assumptions of stationary rewards and finite arms prevalent in prior work. Extensive experiments demonstrate that our method consistently outperforms existing baselines across diverse rotting patterns.

Technology Category

Application Category

📝 Abstract

In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.

Problem

Research questions and friction points this paper is trying to address.

Address infinitely many-armed bandits with rotting rewards

Handle slow-rotting and abrupt-rotting reward scenarios

Develop adaptive UCB algorithm for regret minimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

UCB with adaptive sliding window

Manages bias-variance trade-off

Tight regret bounds for rotting

🔎 Similar Papers

No similar papers found.