🤖 AI Summary
This paper addresses long-horizon multi-agent decision-making under transition kernel uncertainty, proposing a distributionally robust Markov game (DR-MG) framework under the average-reward criterion to optimize worst-case sustained performance. Methodologically, it integrates distributionally robust optimization, robust Bellman equation analysis, and average-reward reinforcement learning to devise a computationally tractable robust Nash iteration algorithm. Theoretically, it establishes—for the first time—the foundations of distributionally robust multi-agent games under average reward: proving the existence of robust Nash equilibria and revealing their asymptotic equivalence to equilibria of discounted games. This work bridges a critical theoretical gap in average-reward multi-agent distributional robustness and introduces a new paradigm for policy design in uncertain dynamic environments—one that delivers both rigorous theoretical guarantees and practical implementability.
📝 Abstract
This paper introduces the formulation of a distributionally robust Markov game (DR-MG) with average rewards, a crucial framework for multi-agent decision-making under uncertainty over extended horizons. Unlike finite-horizon or discounted models, the average-reward criterion naturally captures long-term performance for systems designed for continuous operation, where sustained reliability is paramount. We account for uncertainty in transition kernels, with players aiming to optimize their worst-case average reward. We first establish a connection between the multi-agent and single agent settings, and derive the solvability of the robust Bellman equation under the average-reward formulation. We then rigorously prove the existence of a robust Nash Equilibrium (NE), offering essential theoretical guarantees for system stability. We further develop and analyze an algorithm named robust Nash-Iteration to compute the robust Nash Equilibria among all agents, providing practical tools for identifying optimal strategies in complex, uncertain, and long-running multi-player environments. Finally, we demonstrate the connection between the average-reward NE and the well-studied discounted NEs, showing that the former can be approximated as the discount factor approaches one. Together, these contributions provide a comprehensive theoretical and algorithmic foundation for identifying optimal strategies in complex, uncertain, and long-running multi-player environments, which allow for the future extension of robust average-reward single-agent problems to the multi-agent setting.