Multi-Agent Q-Learning Dynamics in Random Networks: Convergence due to Exploration and Sparsity

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-agent Q-learning often fails to converge to Nash equilibria in large-scale systems due to non-stationary dynamics—such as periodic or chaotic oscillations—induced by agent interdependence. Method: This paper investigates convergence under stochastic network structures, theoretically analyzing networked games on Erdős–Rényi and stochastic block models. It integrates random graph theory, dynamical systems stability analysis, and multi-agent reinforcement learning theory to derive a sufficient condition for global convergence. Contribution/Results: We establish, for the first time, a joint sufficient condition linking network sparsity and exploration rate that guarantees high-probability convergence to a unique Nash equilibrium—even as the number of agents grows substantially. This overcomes the inherent limitation wherein dense networks tend to induce chaotic oscillations. Extensive large-scale simulations validate the robustness and practical efficacy of the derived condition.

Technology Category

Application Category

📝 Abstract
Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, and instead display complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdos-Renyi model, a well-studied model for social networks, and the Stochastic Block model, which generalizes the above by accounting for community structures within the network. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the sparsity of the network. Finally, we validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided network sparsity is controlled.
Problem

Research questions and friction points this paper is trying to address.

Study Q-learning dynamics in random network polymatrix games.
Establish conditions for convergence to unique equilibrium in multi-agent systems.
Investigate impact of exploration rates, payoff matrices, and network sparsity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-learning dynamics in network polymatrix games
Convergence conditions based on network sparsity
Validation through Erdos-Renyi and Stochastic Block models
🔎 Similar Papers
No similar papers found.