Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the problem of achieving optimal decision-making in linear contextual bandits under extremely sparse parameter updates—specifically, only $O(\log\log T)$ times over horizon $T$. The paper proposes two efficient algorithms, BLCE-G and BLCE, both built upon a static scheduling mechanism that is applicable to both small and large action sets and extends naturally to generalized linear models. The key contribution lies in establishing, for the first time, a minimax-optimal regret bound under such infrequent update constraints, up to polylogarithmic factors in $T$. Notably, BLCE eliminates the need for approximate G-optimal design traditionally used in related methods, thereby substantially reducing computational complexity and emerging as the most computationally efficient algorithm to date that attains minimax optimality in this setting.

📝 Abstract

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods additionally restrict within-interval context adaptivity, meaning that the action rule inside an interval cannot depend on the sequence of realized contexts/actions in that interval (beyond the current round's context). For linear contextual bandits, we propose two practical algorithms with only $O(\log\log T)$ parameter updates. Our first algorithm BLCE-G attains minimax-optimal regret (up to polylogarithmic factors in $T$) simultaneously in both the small-$K$ and large-$K$ regimes under a static schedule. Our second algorithm BLCE removes the near G-optimal design step -- a dominant computational bottleneck in prior strictly batched static-grid methods -- yet preserves minimax-optimal regret and achieves the lowest known runtime complexity among optimal algorithms. We further extend these rare-update and computational principles to generalized linear contextual bandits. Overall, our results yield statistically optimal algorithms under $O(\log\log T)$ parameter updates that are also computationally efficient in practice.

Problem

Research questions and friction points this paper is trying to address.

linear contextual bandits

rare parameter updates

minimax-optimal regret

computational efficiency

batched learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

linear contextual bandits

rare parameter updates

minimax-optimal regret