π€ AI Summary
This work addresses the challenge of multi-armed bandits with a dynamically growing arm set, where traditional static regret measures are no longer suitable. The authors propose the UCB-AA algorithm, which introduces a novel dynamic regret metric explicitly tied to the arm arrival process and employs an elimination-based strategy featuring pre-screening andζ·ζ±° mechanisms. Built upon the UCB framework, the method integrates sequential filtering, dynamic benchmark tracking, and online adaptation, making it applicable to settings with unknown time horizons. Under mild assumptions on gap evolution, the algorithm is theoretically shown to achieve sublinear dynamic regret. Empirical results demonstrate that UCB-AA effectively reduces futile trials, maintains a small active arm set, and retains strong regret performance.
π Abstract
We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments. To address the resulting challenges of arrival information discrepancy (AID) and a drifting benchmark (DB), we propose UCB for Arriving Arms (UCB-AA), an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms. We show that UCB-AA attains regret bounds that depend explicitly on the arrival process, achieves sublinear dynamic regret under regularity conditions on gap evolution, and admits an online extension for unknown horizons. Simulation results show that UCB-AA reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.