Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

📅 2024-03-17
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses multi-agent reinforcement learning (MARL) in settings featuring intra-team cooperation and inter-team non-zero-sum competition, aiming to design decentralized algorithms that enable each team to autonomously minimize its cumulative cost and converge to a Nash equilibrium. To tackle non-stationarity arising from finite populations, we propose the Generalized Stochastic Mean-Field Type Game (GS-MFTG) framework—a generalized linear-quadratic model—and establish existence of Nash equilibria along with an $O(1/M)$-approximation guarantee for finite-population games. We introduce MRNPG, the first decentralized RL algorithm with provable convergence for mixed cooperative-competitive structures: it employs Hamilton–Jacobi–Isaacs (HJI) equation decomposition to enable independent natural policy gradient updates, achieving global linear convergence even under non-convex objectives. We theoretically prove that mean-field Nash equilibria $varepsilon$-approximate finite-population equilibria, and corroborate our analysis via numerical experiments demonstrating both empirical efficacy and theoretical consistency.

Technology Category

Application Category

📝 Abstract
We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in the finite population setting, we consider the case where the number of agents within each team is infinite, i.e., the mean-field setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTG). We characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard invertibility condition. This MFTG NE is then shown to be $O(1/M)$-NE for the finite population game where $M$ is a lower bound on the number of agents in each team. These structural results motivate an algorithm called Multi-player Receding-horizon Natural Policy Gradient (MRNPG), where each team minimizes its cumulative cost emph{independently} in a receding-horizon manner. Despite the non-convexity of the problem, we establish that the resulting algorithm converges to a global NE through a novel problem decomposition into sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs (HJI) equations, in which emph{independent natural policy gradient} is shown to exhibit linear convergence under time-independent diagonal dominance. Numerical studies included corroborate the theoretical results.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning for cooperative-competitive agents
Nash equilibrium in General-Sum LQ Mean-Field Games
Convergence of Multi-player Receding-horizon Natural Policy Gradient
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mean-Field Game Theory
Independent Natural Policy Gradient
Receding-horizon Optimization
🔎 Similar Papers
No similar papers found.