Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses risk-sensitive multi-armed bandit optimization, aiming to identify the Pareto-optimal arm set under mean–variance criteria—simultaneously maximizing expected reward and minimizing risk in uncertain environments. To this end, we propose a unified meta-algorithmic framework that adaptively supports both fixed-confidence and fixed-budget settings, employing a single sampling strategy for efficient risk-aware decision-making. Our key contribution lies in designing scenario-adaptive, tight confidence intervals, backed by rigorous theoretical analysis guaranteeing correctness and convergence of the Pareto-optimal solution set. Experiments on synthetic benchmarks demonstrate that our method significantly outperforms existing baselines in both identification accuracy and sample efficiency.

Technology Category

Application Category

📝 Abstract
Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization that jointly addresses two critical aspects of decision-making: maximizing expected reward and minimizing associated uncertainty, quantified via the mean-variance(MV) criterion. Unlike traditional bandit formulations that focus solely on expected returns, our objective is to efficiently and accurately identify the Pareto-optimal set of arms that strikes the best trade-off between expected performance and risk. We propose a unified meta-algorithmic framework capable of operating under both fixed-confidence and fixed-budget regimes, achieved through adaptive design of confidence intervals tailored to each scenario using the same sample exploration strategy. We provide theoretical guarantees on the correctness of the returned solutions in both settings. To complement this theoretical analysis, we conduct extensive empirical evaluations across synthetic benchmarks, demonstrating that our approach outperforms existing methods in terms of both accuracy and sample efficiency, highlighting its broad applicability to risk-aware decision-making tasks in uncertain environments.
Problem

Research questions and friction points this paper is trying to address.

Maximize expected reward and minimize risk in decision-making
Identify Pareto-optimal arms balancing performance and risk
Develop adaptive algorithm for fixed-budget and fixed-confidence scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mean-variance criterion for risk-reward trade-off
Unified meta-algorithm for fixed regimes
Adaptive confidence intervals design
🔎 Similar Papers
No similar papers found.
S
Shunta Nonaga
Hokkaido University, Hokkaido, Japan
K
Koji Tabata
Hokkaido University, Hokkaido, Japan
Y
Yuta Mizuno
Hokkaido University, Hokkaido, Japan
Tamiki Komatsuzaki
Tamiki Komatsuzaki
Hokkaido University
Chemical and Bilogical PhysicsDynamical Systems TheoryNonlinear Physics