High-dimensional Contextual Bandit Problem without Sparsity

๐Ÿ“… 2023-06-19
๐Ÿ›๏ธ Neural Information Processing Systems
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper studies high-dimensional linear contextual bandits where the feature dimension $p$ may vastly exceed the time horizon $T$โ€”even diverging to infinityโ€”and where the regression coefficients are *not assumed sparse*. Under a low effective-rank assumption on the feature covariance, we establish, for the first time in this non-sparse overparameterized regime, the optimal cumulative regret rate $O(sqrt{T})$ for the explore-then-commit (EtC) framework. We propose Adaptive EtC (AEtC), an algorithm that employs minimum-norm interpolation estimators and a data-dependent stopping rule to automatically balance exploration and exploitation. Theoretically, AEtC achieves the optimal statistical rate without sparsity assumptions; empirically, it significantly outperforms standard EtC and existing baselines. Our key contribution is breaking the sparsity barrier by systematically integrating overparameterized learning theory into contextual bandits, yielding a novel analytical paradigm and practical algorithm for high-dimensional, non-sparse sequential decision-making.
๐Ÿ“ Abstract
In this research, we investigate the high-dimensional linear contextual bandit problem where the number of features $p$ is greater than the budget $T$, or it may even be infinite. Differing from the majority of previous works in this field, we do not impose sparsity on the regression coefficients. Instead, we rely on recent findings on overparameterized models, which enables us to analyze the performance the minimum-norm interpolating estimator when data distributions have small effective ranks. We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. Through our analysis, we derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. Moreover, we introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance. We assess the performance of the proposed algorithms through a series of simulations.
Problem

Research questions and friction points this paper is trying to address.

Solving high-dimensional contextual bandits without sparsity assumptions
Analyzing minimum-norm estimators for overparameterized bandit models
Designing adaptive algorithms for optimal exploration-exploitation balance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses minimum-norm interpolating estimator
Proposes explore-then-commit algorithm
Introduces adaptive explore-then-commit algorithm
Junpei Komiyama
Junpei Komiyama
New York University / MBZUAI / RIKEN
Artificial IntelligenceMachine Learning
M
M. Imaizumi
The University of Tokyo / RIKEN Center for AIP