High-dimensional Nonparametric Contextual Bandit Problem

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the high-dimensional nonparametric contextual bandit problem, aiming to overcome the fundamental limitation of classical kernel methods (e.g., Gaussian kernels), which incur trivial $O(T)$ regret when feature dimension scales as $Omega(log T)$. Under a stochastic context distribution assumption, we propose a kernelized nonparametric learning framework—yielding the first regret-optimal solution for growing dimension. We establish that sublinear regret is achievable even when dimension grows as $omega(log T)$. Furthermore, we develop a relaxed regret analysis framework, deriving a $Delta$-dependent convergence rate of $O(T Delta^{1/2})$, markedly improving upon the prior $O(T)$ bound. Our approach provides the first nonparametric solution for high-dimensional online sequential decision-making (e.g., personalized recommendation) that is simultaneously statistically sound and computationally feasible.

Technology Category

Application Category

📝 Abstract
We consider the kernelized contextual bandit problem with a large feature space. This problem involves $K$ arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of $O(T)$ when we consider $Omega(log T)$ feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which allows a per-round regret of at most $Delta>0$. We derive the rate of lenient regret in terms of $Delta$.
Problem

Research questions and friction points this paper is trying to address.

Solving high-dimensional nonparametric contextual bandit problems
Achieving no-regret learning with growing feature dimensions
Analyzing lenient regret rates for decision-making scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernelized contextual bandits for high-dimensional spaces
Stochastic assumptions enable no-regret learning
Lenient regret analysis with per-round allowance