🤖 AI Summary
This paper studies distributionally robust Markov decision processes (DR-MDPs), addressing the limitation of existing approaches that rely exclusively on Wasserstein- or KL-divergence-based ambiguity sets. We propose the first online Q-learning algorithm applicable to *arbitrary* finite-support probability measure ambiguity sets. Methodologically, we integrate robust optimization with Q-learning to construct an iterative update framework grounded in distributionally robust dynamic programming, and establish its convergence under standard assumptions. Our contributions are threefold: (1) the first Q-learning framework supporting arbitrarily structured finite-support ambiguity sets—eliminating dependence on specific distance metrics; (2) an online learning mechanism with theoretical convergence guarantees; and (3) flexible user-specified ambiguity sets better aligned with real-world uncertainty structures. Numerical experiments demonstrate the algorithm’s computational efficiency, scalability, and significantly enhanced policy robustness under model misspecification.
📝 Abstract
In this paper we propose a novel $Q$-learning algorithm allowing to solve distributionally robust Markov decision problems for which the ambiguity set of probability measures can be chosen arbitrarily as long as it comprises only a finite amount of measures. Therefore, our approach goes beyond the well-studied cases involving ambiguity sets of balls around some reference measure with the distance to reference measure being measured with respect to the Wasserstein distance or the Kullback--Leibler divergence. Hence, our approach allows the applicant to create ambiguity sets better tailored to her needs and to solve the associated robust Markov decision problem via a $Q$-learning algorithm whose convergence is guaranteed by our main result. Moreover, we showcase in several numerical experiments the tractability of our approach.