Experimental Design for Semiparametric Bandits

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies the finite-armed semiparametric bandit problem, where rewards from each arm decompose into a linear component and an unknown, possibly adversarial offset term—unifying modeling of linear structure and nonlinear perturbations. We propose the first unified framework integrating orthogonalized regression, adaptive experimental design, and non-asymptotic statistical analysis. Our method achieves tight regret bounds, PAC guarantees, and optimal-arm identification simultaneously. Under general conditions, it attains the optimal $ ilde{O}(sqrt{dT})$ regret bound—matching the linear bandit lower bound—while achieving logarithmic regret when a positive gap exists, thereby meeting the minimax lower bound. The approach is both robust to adversarial offsets and computationally efficient, significantly extending the applicability of classical linear bandits beyond strict linearity assumptions.

Technology Category

Application Category

📝 Abstract

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $ ilde{O}(sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.

Problem

Research questions and friction points this paper is trying to address.

Study semiparametric bandits with linear and adversarial components

Propose first experimental-design approach with sharp regret bounds

Achieve minimax regret and logarithmic regret under conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric bandits with linear and adversarial components

First experimental-design approach with sharp regret bounds

Optimal orthogonalized regression for robust learning

🔎 Similar Papers

Identifiable latent bandits: Combining observational data and exploration for personalized healthcare

2024-07-23arXiv.orgCitations: 0

Authors to Follow