High-dimensional Nonparametric Contextual Bandit Problem

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the high-dimensional nonparametric contextual bandit problem, aiming to overcome the fundamental limitation of classical kernel methods (e.g., Gaussian kernels), which incur trivial $O(T)$ regret when feature dimension scales as $Omega(log T)$. Under a stochastic context distribution assumption, we propose a kernelized nonparametric learning framework—yielding the first regret-optimal solution for growing dimension. We establish that sublinear regret is achievable even when dimension grows as $omega(log T)$. Furthermore, we develop a relaxed regret analysis framework, deriving a $Delta$-dependent convergence rate of $O(T Delta^{1/2})$, markedly improving upon the prior $O(T)$ bound. Our approach provides the first nonparametric solution for high-dimensional online sequential decision-making (e.g., personalized recommendation) that is simultaneously statistically sound and computationally feasible.

Technology Category

Application Category

📝 Abstract

We consider the kernelized contextual bandit problem with a large feature space. This problem involves $K$ arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of $O(T)$ when we consider $Omega(log T)$ feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which allows a per-round regret of at most $Delta>0$. We derive the rate of lenient regret in terms of $Delta$.

Problem

Research questions and friction points this paper is trying to address.

Solving high-dimensional nonparametric contextual bandit problems

Achieving no-regret learning with growing feature dimensions

Analyzing lenient regret rates for decision-making scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernelized contextual bandits for high-dimensional spaces

Stochastic assumptions enable no-regret learning

Lenient regret analysis with per-round allowance

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

2024-10-02International Conference on Machine LearningCitations: 1

Identifiable latent bandits: Combining observational data and exploration for personalized healthcare

2024-07-23arXiv.orgCitations: 0

Authors to Follow