Understanding algorithmic collusion with experience replay

📅 2021-02-18
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the mechanisms by which independent Q-learning agents spontaneously collude to sustain high prices in infinitely repeated general and pricing games. Addressing the insufficient explanatory power of “black-box collusion” accounts, we identify two key human-driven factors: experience replay policies—particularly recency bias—and relative performance concerns. We design three experience replay variants and conduct robustness experiments within a heterogeneous multi-agent framework. Results show that randomized experience replay effectively suppresses collusion, driving outcomes toward the Bertrand equilibrium; conversely, recency-biased replay or incorporation of relative performance feedback markedly enhances both the stability and persistence of high-price collusion. This work provides the first systematic causal evidence of how experience replay and relative performance incentives shape algorithmic collusion formation. It thereby establishes actionable theoretical foundations for regulatory interventions—such as auditing replay mechanisms—to mitigate anti-competitive behavior in autonomous agent systems.
📝 Abstract
In an infinitely repeated pricing game, pricing algorithms based on artificial intelligence (Q-learning) may consistently learn to charge supra-competitive prices even without communication. Although concerns on algorithmic collusion have arisen, little is known on underlying factors. In this work, we experimentally analyze the dynamics of algorithms with three variants of experience replay. Algorithmic collusion still has roots in human preferences. Randomizing experience yields prices close to the static Bertrand equilibrium and higher prices are easily restored by favoring the latest experience. Moreover, relative performance concerns also stabilize the collusion. Finally, we investigate the scenarios with heterogeneous agents and test robustness on various factors.
Problem

Research questions and friction points this paper is trying to address.

Investigates algorithmic collusion in pricing games using independent reinforcement learners
Analyzes how relative performance considerations affect long-term pricing dynamics
Explores mitigation of overfitting in Q-learning through relative experience replay
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using relative performance metrics in experience replay
Independent learners converge to Bertrand-Nash equilibrium
Relative experience replay mitigates Q-learning overfitting issues
🔎 Similar Papers
2024-03-31arXiv.orgCitations: 12