Optimal Best Arm Identification with Post-Action Context

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies best-arm identification (BAI) in stochastic multi-armed bandits with post-action context: additional contextual information is revealed only after an action is taken, aiding subsequent decisions but unavailable during action selection. We introduce a novel BAI framework explicitly modeling post-action context and distinguish between separable and non-separable context structures. For the separable case, we propose a geometry-aware G-tracking sampling rule that directly models and tracks the optimal arm’s distribution over the context space. Integrating an extension of the Track-and-Stop framework, instance-dependent lower bound analysis, and geometric context modeling, our method achieves asymptotically optimal sample complexity. Experiments demonstrate that the proposed approach significantly reduces sampling cost and outperforms existing BAI algorithms.

Technology Category

Application Category

📝 Abstract
We introduce the problem of best arm identification (BAI) with post-action context, a new BAI problem in a stochastic multi-armed bandit environment and the fixed-confidence setting. The problem addresses the scenarios in which the learner receives a $ extit{post-action context}$ in addition to the reward after playing each action. This post-action context provides additional information that can significantly facilitate the decision process. We analyze two different types of the post-action context: (i) $ extit{non-separator}$, where the reward depends on both the action and the context, and (ii) $ extit{separator}$, where the reward depends solely on the context. For both cases, we derive instance-dependent lower bounds on the sample complexity and propose algorithms that asymptotically achieve the optimal sample complexity. For the non-separator setting, we do so by demonstrating that the Track-and-Stop algorithm can be extended to this setting. For the separator setting, we propose a novel sampling rule called $ extit{G-tracking}$, which uses the geometry of the context space to directly track the contexts rather than the actions. Finally, our empirical results showcase the advantage of our approaches compared to the state of the art.
Problem

Research questions and friction points this paper is trying to address.

Identify best arm with post-action context
Analyze non-separator and separator contexts
Propose algorithms for optimal sample complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Track-and-Stop algorithm
Introduces G-tracking sampling rule
Analyzes post-action context types
🔎 Similar Papers
No similar papers found.