Cycles and collusion in congestion games under Q-learning

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the dynamic evolution of Q-learning in generalized Braess’s paradox congestion games, focusing on convergence and stability when stage-game Nash equilibria are socially inefficient. Employing game-theoretic dynamics analysis, meta-game modeling, and rigorous convergence theory, we establish three novel results: (1) Heterogeneity in learners’ parameters constitutes a meta-game Nash equilibrium, inducing incentive misalignment and cooperation failure; (2) The system exhibits bimodal behavior—either converging to a fixed point or oscillating periodically (reminiscent of Edgeworth cycles)—depending on learning rates; (3) Noncooperative parameter configurations preclude social welfare from exceeding the Nash benchmark. Based on these findings, we propose a new analytical framework integrating regulatory intervention and implicit collusion, offering a mechanistic explanation for efficiency loss in multi-agent reinforcement learning.

Technology Category

Application Category

📝 Abstract
We investigate the dynamics of Q-learning in a class of generalized Braess paradox games. These games represent an important class of network routing games where the associated stage-game Nash equilibria do not constitute social optima. We provide a full convergence analysis of Q-learning with varying parameters and learning rates. A wide range of phenomena emerges, broadly either settling into Nash or cycling continuously in ways reminiscent of"Edgeworth cycles"(i.e. jumping suddenly from Nash toward social optimum and then deteriorating gradually back to Nash). Our results reveal an important incentive incompatibility when thinking in terms of a meta-game being played by the designers of the individual Q-learners who set their agents' parameters. Indeed, Nash equilibria of the meta-game are characterized by heterogeneous parameters, and resulting outcomes achieve little to no cooperation beyond Nash. In conclusion, we suggest a novel perspective for thinking about regulation and collusion, and discuss the implications of our results for Bertrand oligopoly pricing games.
Problem

Research questions and friction points this paper is trying to address.

Q-learning dynamics in Braess paradox games
Convergence analysis with varying parameters
Incentive incompatibility in Q-learners' meta-game design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-learning in Braess paradox
Convergence analysis with parameters
Meta-game for regulation design
🔎 Similar Papers
No similar papers found.