EVaR-Optimal Arm Identification in Bandits

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies best-arm identification (BAI) in multi-armed bandits under entropy-based Value-at-Risk (EVaR), addressing risk-averse decision-making in high-stakes domains such as finance. It introduces EVaR—previously unexplored in BAI—into the fixed-confidence BAI framework for bounded [0,1] reward distributions. We propose a δ-correct Track-and-Stop algorithm that employs nonparametric modeling and joint convex/nonconvex optimization to design risk-sensitive sampling policies. Leveraging large-deviations theory, we establish its asymptotic optimality: the sample complexity tightly matches the information-theoretic lower bound. Experiments demonstrate that the algorithm achieves the prescribed confidence level while substantially improving risk control over baseline methods. This work provides the first theoretically optimal BAI solution for risk-sensitive sequential decision-making, with formal guarantees on both statistical correctness and risk-aware performance.

Technology Category

Application Category

📝 Abstract
We study the fixed-confidence best arm identification (BAI) problem within the multi-armed bandit (MAB) framework under the Entropic Value-at-Risk (EVaR) criterion. Our analysis considers a nonparametric setting, allowing for general reward distributions bounded in [0,1]. This formulation addresses the critical need for risk-averse decision-making in high-stakes environments, such as finance, moving beyond simple expected value optimization. We propose a $δ$-correct, Track-and-Stop based algorithm and derive a corresponding lower bound on the expected sample complexity, which we prove is asymptotically matched. The implementation of our algorithm and the characterization of the lower bound both require solving a complex convex optimization problem and a related, simpler non-convex one.
Problem

Research questions and friction points this paper is trying to address.

Identifying the best arm under EVaR criterion in bandits
Addressing risk-averse decision-making beyond expected value optimization
Solving complex optimization problems for algorithm implementation and bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

EVaR-based risk-averse bandit algorithm design
Nonparametric reward distribution handling method
Track-and-Stop with convex optimization implementation
🔎 Similar Papers
No similar papers found.