Balancing Performance and Costs in Best Arm Identification

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In best-arm identification for multi-armed bandits, practitioners face a fundamental trade-off between budget and confidence-level specifications. This paper introduces a risk-minimization paradigm: we formally define a cost-performance risk function jointly capturing sampling cost and misidentification loss, enabling unified optimization of accuracy and resource efficiency. Theoretically, we derive information-theoretic lower bounds on the risk under two distinct performance penalties—misidentification probability and simple regret. Algorithmically, we propose DBCARE, an adaptive algorithm achieving polylog-optimal sample complexity and matching the derived lower bounds. Experiments across diverse synthetic settings demonstrate that DBCARE consistently outperforms fixed-budget and fixed-confidence baselines, attaining near-optimal risk in most instances. Our framework provides an interpretable, hyperparameter-free solution for profit-maximizing applications such as A/B testing.

Technology Category

Application Category

📝 Abstract
We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner's goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g. maximizing profit in an A/B testing framework, better than classical fixed budget or confidence settings. We derive theoretical lower bounds for the risk of each of two choices for the performance penalty, the probability of misidentification and the simple regret, and propose an algorithm called DBCARE to match these lower bounds up to polylog factors on nearly all problem instances. We then demonstrate the performance of DBCARE on a number of simulated models, comparing to fixed budget and confidence algorithms to show the shortfalls of existing BAI paradigms on this problem.
Problem

Research questions and friction points this paper is trying to address.

Balancing performance and costs in best arm identification
Choosing approach and parameters in multi-armed bandit models
Minimizing risk functional for optimal arm recommendation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes risk balancing performance and cost
Uses DBCARE algorithm for optimal performance
Incorporates cost per observation and penalty
🔎 Similar Papers
No similar papers found.