Representative Action Selection for Large Action-Space Meta-Bandits

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
For meta-multi-armed bandits with large action spaces, this paper proposes an action representative set selection framework that approximates full-action-space performance while drastically reducing the number of actions. Methodologically, it introduces the first integration of Gaussian processโ€“based action similarity modeling with ฮต-net sampling to construct a theoretically grounded and computationally efficient selection mechanism, further enhanced by meta-learning for constrained bandit policy optimization. We establish a tight theoretical upper bound on the performance loss induced by the selected subset. Empirically, the approach retains over 95% of the original reward under 90% action compression and consistently outperforms Thompson Sampling and UCB across multiple benchmarks. The core contribution is the first action space compression paradigm for bandits that simultaneously offers rigorous theoretical guarantees and practical efficacy.

Technology Category

Application Category

๐Ÿ“ Abstract
We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. We assume that similar actions tend to have related payoffs, modeled by a Gaussian process. To exploit this structure, we propose a simple epsilon-net algorithm to select a representative subset. We provide theoretical guarantees for its performance and compare it empirically to Thompson Sampling and Upper Confidence Bound.
Problem

Research questions and friction points this paper is trying to address.

Selecting subset from large action space for bandits
Achieving performance close to full action space
Modeling similar actions with Gaussian process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian process for payoff modeling
Proposes epsilon-net algorithm for selection
Compares with Thompson Sampling and UCB
๐Ÿ”Ž Similar Papers
No similar papers found.