Memory-Induced Supra-Competitive Outcomes Between Deep Reinforcement Learning Agents in Optimal Trade Execution

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study investigates whether deep reinforcement learning agents can achieve super-competitive outcomes—defined as lower execution costs than game-theoretic competitive benchmarks—in a shared optimal trade execution setting. By formulating a two-agent Almgren-Chriss liquidation game, the authors systematically compare non-interactive ex-ante scheduling strategies against several history-conditioned DDQN architectures to examine how information feedback mechanisms shape strategy evolution. The findings reveal that super-competitive performance does not arise merely from multi-agent learning or reliance on current price observations alone, but instead emerges from the interplay of state dependence along execution trajectories, memory mechanisms, and interactive feedback. Notably, when agents have access to historical states, recent prices, and their own past actions, super-competitive behavior becomes significantly more frequent and persistent, underscoring the critical role of memory in enhancing execution efficiency.

📝 Abstract

In this paper, we investigate whether deep reinforcement-learning agents interacting in a shared optimal-execution environment can sustain supra-competitive outcomes, in the sense of achieving lower implementation shortfalls than the relevant game-theoretical competitive benchmark. We study a two-agent Almgren-Chriss liquidation game and examine how learned behavior depends on intra-episode environment feedback, the ability to interpret the mid-price and the agent's knoledge of the past. We first use ex-ante schedule-learning agents to remove intra-episode feedback and isolate what can arise when agents commit to complete liquidation trajectories before execution begins. We then allow agents to condition on the evolving state using a variety of DDQN architectures. We find that, when agents are given access to intra-episode history, especially recent prices and own past actions, supra-competitive outcomes become substantially more frequent and more persistent. These findings indicate that supra-competitive behavior in this execution game is driven not by multi-agent learning or by current price observation alone, but by feedback, memory, and state-contingent interaction along the realized execution path.

Problem

Research questions and friction points this paper is trying to address.

supra-competitive outcomes

optimal trade execution

deep reinforcement learning

multi-agent interaction

execution cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

supra-competitive outcomes

deep reinforcement learning

optimal trade execution