Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

📅 2024-06-03

🏛️ International Conference on Machine Learning

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the combinatorial multi-armed bandit (CMAB) problem by proposing CMAB-MT, a novel framework supporting d-dimensional multivariate arm outcomes and probabilistic triggering feedback. Methodologically, it introduces, for the first time, a smoothness condition defined via the ℓ¹-norm for multivariate reward functions and designs the optimistic CUCB-MT algorithm, establishing its regret upper bound. The contributions are threefold: (1) It unifies modeling of multidimensional stochasticity and probabilistic triggering, substantially enhancing the expressiveness of CMAB; (2) It establishes, for the first time, a rigorous theoretical connection between episodic reinforcement learning (RL) and CMAB, offering a new perspective on episodic RL; (3) In applications including contextual RL and probabilistic maximum coverage, it achieves optimal regret bounds that match or improve upon state-of-the-art results, thereby advancing both the theoretical depth and practical applicability of CMAB.

Technology Category

Application Category

📝 Abstract

We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$-dimensional multivariant random variable and the feedback follows a general arm triggering process. Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm built upon this condition. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution, all of which meet the above smoothness condition and achieve matching or improved regret bounds compared to existing works. Through our new framework, we build the first connection between the episodic RL and CMAB literature, by offering a new angle to solve the episodic RL through the lens of CMAB, which may encourage more interactions between these two important directions.

Problem

Research questions and friction points this paper is trying to address.

Modeling combinatorial multi-armed bandits with multivariant outcomes

Enhancing episodic reinforcement learning via CMAB framework

Improving regret bounds for probabilistic triggering arm problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel CMAB framework with multivariant triggering arms

1-norm smoothness condition for multivariant variables

Optimistic CUCB-MT algorithm for improved regret bounds

🔎 Similar Papers

No similar papers found.

Authors to Follow