A characterization of sample adaptivity in UCB data

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the statistical nature of sampling adaptivity in the Upper Confidence Bound (UCB) algorithm for stochastic two-armed bandits, focusing on the joint asymptotic distribution of arm selection counts and sample mean rewards. We establish, for the first time, a joint central limit theorem (CLT) under UCB, revealing nonstandard phase transitions in the convergence behavior of pseudo-regret across small- and large-gap regimes. We derive an explicit closed-form expression for the first-order sampling bias induced by adaptivity. To achieve this, we introduce a novel perturbation analysis framework that integrates the joint CLT, asymptotic inference, and bandit theory—yielding a unified characterization of pseudo-regret convergence and quantifying systematic bias arising from adaptive sampling. Our results provide a rigorous theoretical foundation for statistical calibration in adaptive data collection settings.

Technology Category

Application Category

📝 Abstract

We characterize a joint CLT of the number of pulls and the sample mean reward of the arms in a stochastic two-armed bandit environment under UCB algorithms. Several implications of this result are in place: (1) a nonstandard CLT of the number of pulls hence pseudo-regret that smoothly interpolates between a standard form in the large arm gap regime and a slow-concentration form in the small arm gap regime, and (2) a heuristic derivation of the sample bias up to its leading order from the correlation between the number of pulls and sample means. Our analysis framework is based on a novel perturbation analysis, which is of broader interest on its own.

Problem

Research questions and friction points this paper is trying to address.

Characterizes joint CLT for pulls and rewards in UCB algorithms.

Explores nonstandard CLT for pulls and pseudo-regret in bandit environments.

Derives sample bias from correlation between pulls and sample means.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel perturbation analysis framework

Characterizes joint CLT for UCB algorithms

Heuristic derivation of sample bias

🔎 Similar Papers

No similar papers found.

Authors to Follow