Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

235K/year
🤖 AI Summary
This paper addresses the risk-sensitive multi-armed bandit problem and identifies a fundamental limitation of the conventional single-arm optimality assumption under generalized distortion-risk measures: for most risk measures, the optimal policy requires mixing across multiple arms. To address this, we formally define and prove arm-mixing optimality—the first such result in the literature. We propose an adaptive algorithm capable of uniformly tracking either mixed or pure optimal policies. Leveraging asymptotically optimal sampling design and risk-aware regret analysis, we establish a convergence regret bound of $O((log T/T)^ u)$ for some $ u > 0$, substantially improving upon existing rates. Our core contributions are threefold: (i) breaking the single-arm paradigm; (ii) establishing a general, unified framework for risk-sensitive bandits; and (iii) achieving theoretically optimal convergence speed under broad distortion-risk measures.

Technology Category

Application Category

📝 Abstract
This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for a wide range of riskmetrics, the optimal bandit policy involves selecting a mixture of arms. This is in sharp contrast to the convention in the multi-arm bandit algorithms that there is generally a solitary arm that maximizes the utility, whether purely reward-centric or risk-sensitive. This creates a major departure from the principles for designing bandit algorithms since there are uncountable mixture possibilities. The contributions of the paper are as follows: (i) it formalizes a general framework for risk-sensitive bandits, (ii) identifies standard risk-sensitive bandit models for which solitary arm selections is not optimal, (iii) and designs regret-efficient algorithms whose sampling strategies can accurately track optimal arm mixtures (when mixture is optimal) or the solitary arms (when solitary is optimal). The algorithms are shown to achieve a regret that scales according to $O((log T/T )^{ u})$, where $T$ is the horizon, and $ u>0$ is a riskmetric-specific constant.
Problem

Research questions and friction points this paper is trying to address.

Introduces a framework for risk-sensitive bandits using distortion riskmetrics.
Identifies conditions where optimal policies require arm mixtures, not single arms.
Designs regret-efficient algorithms for tracking optimal arm mixtures or single arms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates distortion riskmetrics for risk-sensitive objectives
Identifies optimal bandit policies involving arm mixtures
Designs regret-efficient algorithms tracking optimal arm mixtures
🔎 Similar Papers
No similar papers found.