Batched Nonparametric Bandits via k-Nearest Neighbor UCB

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper studies the nonparametric contextual multi-armed bandit problem in the batched setting, motivated by applications such as medical interventions and precision marketing—domains characterized by scarce and significantly delayed online feedback. We propose the first fully nonparametric batched bandit algorithm: it employs adaptive k-nearest neighbor regression for locally geometry-aware reward estimation and integrates a UCB-based exploration-exploitation trade-off. Crucially, it imposes no assumptions on reward function form, grid-based binning, or prior knowledge of intrinsic dimensionality, and automatically adapts to the underlying context dimension. Theoretically, we establish that it achieves the minimax-optimal regret bound $ ilde{O}(T^{frac{d+1}{d+2}})$ in the nonparametric regime—the first such guarantee for a batched algorithm in this setting. Experiments on synthetic and real-world datasets demonstrate substantial improvements over state-of-the-art binning-based baselines, confirming both robustness and practical efficacy.

Technology Category

Application Category

📝 Abstract

We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing -- where online feedback is limited -- we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement. Unlike prior work relying on parametric or binning-based estimators, BaNk-UCB uses local geometry to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves minimax-optimal rates. Empirical evaluations on synthetic and real-world datasets demonstrate that BaNk-UCB consistently outperforms binning-based baselines.

Problem

Research questions and friction points this paper is trying to address.

Sequential decision-making in batched nonparametric contextual bandits

Adaptive k-NN regression with UCB for limited feedback domains

Achieving near-optimal regret with minimax-optimal rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines k-NN regression with UCB principle

Adapts to context dimension nonparametrically

Uses local geometry for reward estimation

🔎 Similar Papers

Batched Nonparametric Contextual Bandits