Statistical Inference under Adaptive Sampling with LinUCB

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the failure of classical statistical inference in LinUCB for linear contextual bandits—caused by adaptive sampling—we uncover an intrinsic stability property under unit-sphere action sets: the design matrix decomposes into a rank-one component aligned with the true parameter and an approximately isotropic bulk. Leveraging this structure, we combine random matrix theory with asymptotic statistical analysis to establish asymptotic normality of the parameter estimator, yielding a central limit theorem with convergence rate $T^{-1/4}$. Building on this, we propose Wald-type confidence sets and hypothesis tests that require no prior covariance assumptions—achieving significantly tighter guarantees than existing non-asymptotic bounds. Extensive simulations confirm superior coverage accuracy and inferential power of our approach.

Technology Category

Application Category

📝 Abstract
Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be mitigated by a property called stability, which states that if the rate at which an algorithm takes actions converges to a deterministic limit, one can expect that certain parameters are asymptotically normal. Building on a recent line of work for the multi-armed bandit setting, we show that the linear upper confidence bound (LinUCB) algorithm for linear bandits satisfies this property. In doing so, we painstakingly characterize the behavior of the eigenvalues and eigenvectors of the random design feature covariance matrix in the setting where the action set is the unit ball, showing that it decomposes into a rank-one direction that locks onto the true parameter and an almost-isotropic bulk that grows at a predictable $sqrt{T}$ rate. This allows us to establish a central limit theorem for the LinUCB algorithm, establishing asymptotic normality for the limiting distribution of the estimation error where the convergence occurs at a $T^{-1/4}$ rate. The resulting Wald-type confidence sets and hypothesis tests do not depend on the feature covariance matrix and are asymptotically tighter than existing nonasymptotic confidence sets. Numerical simulations corroborate our findings.
Problem

Research questions and friction points this paper is trying to address.

Addresses bias in adaptively collected data for statistical inference
Establishes asymptotic normality for LinUCB algorithm in linear bandits
Develops tighter confidence sets independent of feature covariance matrix
Innovation

Methods, ideas, or system contributions that make the work stand out.

LinUCB algorithm satisfies stability property
Eigenvalue analysis reveals rank-one and isotropic structure
Central limit theorem enables Wald-type confidence sets
🔎 Similar Papers
No similar papers found.
W
Wei Fan
Department of Statistics and Data Science, The Wharton School, University of Pennsylvania
K
Kevin Tan
Department of Statistics and Data Science, The Wharton School, University of Pennsylvania
Yuting Wei
Yuting Wei
Statistics and Data Science at Wharton, University of Pennsylvania
High dimensional statisticsnonparametric statisticsreinforcement learningdiffusion models