A characterization of sample adaptivity in UCB data

๐Ÿ“… 2025-03-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper investigates the statistical nature of sampling adaptivity in the Upper Confidence Bound (UCB) algorithm for stochastic two-armed bandits, focusing on the joint asymptotic distribution of arm selection counts and sample mean rewards. We establish, for the first time, a joint central limit theorem (CLT) under UCB, revealing nonstandard phase transitions in the convergence behavior of pseudo-regret across small- and large-gap regimes. We derive an explicit closed-form expression for the first-order sampling bias induced by adaptivity. To achieve this, we introduce a novel perturbation analysis framework that integrates the joint CLT, asymptotic inference, and bandit theoryโ€”yielding a unified characterization of pseudo-regret convergence and quantifying systematic bias arising from adaptive sampling. Our results provide a rigorous theoretical foundation for statistical calibration in adaptive data collection settings.

Technology Category

Application Category

๐Ÿ“ Abstract
We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number of pulls hence pseudo-regret that smoothly interpolates between a standard form in the large arm gap regime and a slow-concentration form in the small arm gap regime, and (2) a heuristic derivation of the sample bias up to its leading order from the correlation between the number of pulls and sample means. Our analysis framework is based on a novel perturbation analysis, which is of broader interest on its own.
Problem

Research questions and friction points this paper is trying to address.

Characterizes joint CLT for pulls and rewards in UCB algorithms.
Explores nonstandard CLT for pulls and pseudo-regret in bandit environments.
Derives sample bias from correlation between pulls and sample means.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel perturbation analysis framework
Characterizes joint CLT for UCB algorithms
Heuristic derivation of sample bias
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yilun Chen
School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHK Shenzhen)
Jiaqi Lu
Jiaqi Lu
School of Data Science, CUHK-SZ
matching marketsupply chain managementcustomer relationship management