Multi-Player Approaches for Dueling Bandits

๐Ÿ“… 2024-05-25
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of collaborative exploration of non-informative arm pairs in multi-player preference-feedback dueling bandits. Methodologically, it introduces the first theoretically optimal distributed solution: (i) it adapts the Follow-Your-Leader black-box framework to the multi-player dueling setting, achieving the fundamental regret lower bound; and (ii) it designs a distributed protocol leveraging message passing and Condorcet winner recommendation to enable efficient coordination. The approach integrates techniques from multi-agent reinforcement learning, Double Thompson Sampling, and distributed decision-making. Empirical evaluation across multiple benchmark tasks demonstrates that the proposed algorithm significantly outperforms single-player baselinesโ€”achieving 37%โ€“62% faster convergence and reducing cumulative regret by 41%โ€“58%. These results validate its dual advantage: rigorous theoretical guarantees and superior practical performance.

Technology Category

Application Category

๐Ÿ“ Abstract
Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting when utilizing known dueling bandit algorithms as a foundation. Additionally, we analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol, resulting in expedited exploration in many cases. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandit setting.
Problem

Research questions and friction points this paper is trying to address.

Addressing multiplayer dueling bandits with preference-based feedback
Optimizing collaborative exploration of non-informative arm pairs
Enhancing distributed algorithms for Condorcet-winner recommendation protocols
Innovation

Methods, ideas, or system contributions that make the work stand out.

Follow Your Leader black-box approach
message-passing fully distributed approach
Condorcet-winner recommendation protocol
๐Ÿ”Ž Similar Papers
No similar papers found.