🤖 AI Summary
Optimizing tail-risk metrics—such as Conditional Value-at-Risk (CVaR) and threshold probability—in Markov Decision Processes (MDPs) is computationally challenging, and existing risk-sensitive methods lack interpretability in risk-parameter selection.
Method: This paper introduces a parametric dynamic programming framework grounded in entropy-risk measures. We establish, for the first time, that the set of entropy-risk-optimal policies is smooth and structurally tractable with respect to the risk parameter. Leveraging this property, we construct an efficiently computable optimality frontier that tightly approximates intractable tail-risk objectives.
Contribution/Results: The framework ensures theoretical rigor and computational feasibility: it achieves high-accuracy approximation of CVaR and related metrics in near-linear time across diverse risk-sensitive decision tasks. It significantly improves risk control performance while enhancing policy interpretability and enabling principled, intuitive adjustment of risk sensitivity.
📝 Abstract
Risk-sensitive planning aims to identify policies maximizing some tail-focused metrics in Markov Decision Processes (MDPs). Such an optimization task can be very costly for the most widely used and interpretable metrics such as threshold probabilities or (Conditional) Values at Risk. Indeed, previous work showed that only Entropic Risk Measures (EntRM) can be efficiently optimized through dynamic programming, leaving a hard-to-interpret parameter to choose. We show that the computation of the full set of optimal policies for EntRM across parameter values leads to tight approximations for the metrics of interest. We prove that this optimality front can be computed effectively thanks to a novel structural analysis and smoothness properties of entropic risks. Empirical results demonstrate that our approach achieves strong performance in a variety of decision-making scenarios.