🤖 AI Summary
This paper addresses independent learning in multi-agent reinforcement learning under execution effects—where policy deployment alters environment dynamics—formulated as Markov potential games (MPGs). To overcome the failure of conventional equilibrium concepts in such settings, we introduce the notion of *policy-stable equilibrium* (PSE), the first equilibrium concept tailored to execution effects, and establish its existence rigorously. We further propose two decentralized algorithms: Independent Policy Gradient Ascent (IPGA) and Independent Natural Policy Gradient (INPG). For the first time, we provide last-iterate asymptotic convergence guarantees to exact PSEs; moreover, under mild conditions, we derive finite-time convergence bounds. Extensive experiments validate both the theoretical correctness and practical efficacy of our framework.
📝 Abstract
Performative Reinforcement Learning (PRL) refers to a scenario in which the deployed policy changes the reward and transition dynamics of the underlying environment. In this work, we study multi-agent PRL by incorporating performative effects into Markov Potential Games (MPGs). We introduce the notion of a performatively stable equilibrium (PSE) and show that it always exists under a reasonable sensitivity assumption. We then provide convergence results for state-of-the-art algorithms used to solve MPGs. Specifically, we show that independent policy gradient ascent (IPGA) and independent natural policy gradient (INPG) converge to an approximate PSE in the best-iterate sense, with an additional term that accounts for the performative effects. Furthermore, we show that INPG asymptotically converges to a PSE in the last-iterate sense. As the performative effects vanish, we recover the convergence rates from prior work. For a special case of our game, we provide finite-time last-iterate convergence results for a repeated retraining approach, in which agents independently optimize a surrogate objective. We conduct extensive experiments to validate our theoretical findings.