Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the mechanism by which Q-learning agents spontaneously converge to supracompetitive (i.e., collusive) high prices in infinitely repeated pricing games. Focusing on settings where both a one-shot Nash equilibrium price and a feasible collusive price coexist, it provides the first rigorous theoretical explanation for Q-learning–induced collusion: specifically, it proves that when a “collusion-enabling price” exists and the Q-function satisfies a terminal inequality condition, agents—relying solely on profit feedback—converge to a stable, persistent price strictly above the competitive level. Methodologically, the work integrates Q-learning, repeated game theory, and subgame-perfect equilibrium (SPE) analysis, introducing a class of one-memory SPEs to systematically characterize the equilibrium feasibility boundaries of naive collusion, grim-trigger, and incremental strategies. The core contribution lies in demonstrating how decentralized, knowledge-free learning—without prior equilibrium awareness—endogenously generates non-Nash yet stable collusive outcomes.

Technology Category

Application Category

📝 Abstract

There is growing experimental evidence that $Q$-learning agents may learn to charge supracompetitive prices. We provide the first theoretical explanation for this behavior in infinite repeated games. Firms update their pricing policies based solely on observed profits, without computing equilibrium strategies. We show that when the game admits both a one-stage Nash equilibrium price and a collusive-enabling price, and when the $Q$-function satisfies certain inequalities at the end of experimentation, firms learn to consistently charge supracompetitive prices. We introduce a new class of one-memory subgame perfect equilibria (SPEs) and provide conditions under which learned behavior is supported by naive collusion, grim trigger policies, or increasing strategies. Naive collusion does not constitute an SPE unless the collusive-enabling price is a one-stage Nash equilibrium, whereas grim trigger policies can.

Problem

Research questions and friction points this paper is trying to address.

Theoretical explanation of Q-learning agents' supracompetitive pricing

Conditions enabling collusion in infinite repeated games

Analysis of learned behavior supported by equilibria strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-learning agents learn supracompetitive pricing

One-memory subgame perfect equilibria introduced

Conditions for naive collusion and grim triggers

🔎 Similar Papers

Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents

2024-06-03arXiv.orgCitations: 0

Strategic Collusion of LLM Agents: Market Division in Multi-Commodity Competitions

2024-09-19arXiv.orgCitations: 3

Authors to Follow