Optimal Multi-Objective Best Arm Identification with Fixed Confidence

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the pure-exploration problem of identifying the Pareto-optimal arms across all objective dimensions in multi-objective multi-armed bandits, under a fixed confidence constraint—aiming to minimize the expected stopping time while rigorously controlling the misidentification probability. We formally introduce the multi-objective best-arm identification (MO-BAI) problem for the first time. To address its computational intractability, we propose a novel “surrogate ratio” adaptive sampling rule that avoids costly per-round max-min optimization and achieves asymptotic optimality. Leveraging information-theoretic arguments, we derive a tight lower bound on the sample complexity and complete the theoretical analysis via large-deviation principles. Empirical evaluations demonstrate that our method significantly outperforms existing baselines across diverse objective dimensionalities and arm counts, achieving substantial improvements in sampling efficiency while maintaining high precision guarantees.

Technology Category

Application Category

📝 Abstract

We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {em objective}) is generated independently of the others. The best arm of any given objective is the arm with the largest component of mean corresponding to the objective. The end goal is to identify the best arm of {em every} objective in the shortest (expected) time subject to an upper bound on the probability of error (i.e., fixed-confidence regime). We establish a problem-dependent lower bound on the limiting growth rate of the expected stopping time, in the limit of vanishing error probabilities. This lower bound, we show, is characterised by a max-min optimisation problem that is computationally expensive to solve at each time step. We propose an algorithm that uses the novel idea of {em surrogate proportions} to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step. We demonstrate theoretically that our algorithm is asymptotically optimal. In addition, we provide extensive empirical studies to substantiate the efficiency of our algorithm. While existing works on pure exploration with multi-objective multi-armed bandits predominantly focus on {em Pareto frontier identification}, our work fills the gap in the literature by conducting a formal investigation of the multi-objective best arm identification problem.

Problem

Research questions and friction points this paper is trying to address.

Multi-objective Decision Making

Optimization

Reward Maximization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Substitute Ratio

Multi-objective Bandit Problems

Theoretically Optimal Algorithm

🔎 Similar Papers

No similar papers found.

Authors to Follow