π€ AI Summary
This work studies robust Markov decision processes (RMDPs) under non-rectangular uncertainty sets to address the NP-hardness of policy evaluation in conventional non-rectangular models. For bounded uncertainty sets constrained by $L_p$ norms, we derive the first dual formulation, revealing a sparse structure in the adversarial policy; leveraging this insight, we design the first computationally efficient robust policy evaluation algorithm. Our method integrates $sa$-rectangular decomposition, duality theory, and robust dynamic programming, substantially reducing computational complexity. Experiments demonstrate that our algorithm achieves orders-of-magnitude speedup over brute-force enumeration while maintaining guaranteed accuracy and exhibiting strong scalability. This work establishes the first theoretically rigorous and computationally tractable framework for non-rectangular RMDPs, filling a critical methodological gap in the literature.
π Abstract
We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many exttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.