Representative Action Selection for Large Action-Space Meta-Bandits

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

286K/year

🤖 AI Summary

For meta-multi-armed bandits with large action spaces, this paper proposes an action representative set selection framework that approximates full-action-space performance while drastically reducing the number of actions. Methodologically, it introduces the first integration of Gaussian process–based action similarity modeling with ε-net sampling to construct a theoretically grounded and computationally efficient selection mechanism, further enhanced by meta-learning for constrained bandit policy optimization. We establish a tight theoretical upper bound on the performance loss induced by the selected subset. Empirically, the approach retains over 95% of the original reward under 90% action compression and consistently outperforms Thompson Sampling and UCB across multiple benchmarks. The core contribution is the first action space compression paradigm for bandits that simultaneously offers rigorous theoretical guarantees and practical efficacy.

Technology Category

Application Category

📝 Abstract

We study the problem of selecting a subset from a large action space shared by a family of bandits, with the goal of achieving performance nearly matching that of using the full action space. We assume that similar actions tend to have related payoffs, modeled by a Gaussian process. To exploit this structure, we propose a simple epsilon-net algorithm to select a representative subset. We provide theoretical guarantees for its performance and compare it empirically to Thompson Sampling and Upper Confidence Bound.

Problem

Research questions and friction points this paper is trying to address.

Selecting subset from large action space for bandits

Achieving performance close to full action space

Modeling similar actions with Gaussian process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian process for payoff modeling

Proposes epsilon-net algorithm for selection

Compares with Thompson Sampling and UCB

🔎 Similar Papers

No similar papers found.