🤖 AI Summary
This study investigates whether deep reinforcement learning agents can achieve super-competitive outcomes—defined as lower execution costs than game-theoretic competitive benchmarks—in a shared optimal trade execution setting. By formulating a two-agent Almgren-Chriss liquidation game, the authors systematically compare non-interactive ex-ante scheduling strategies against several history-conditioned DDQN architectures to examine how information feedback mechanisms shape strategy evolution. The findings reveal that super-competitive performance does not arise merely from multi-agent learning or reliance on current price observations alone, but instead emerges from the interplay of state dependence along execution trajectories, memory mechanisms, and interactive feedback. Notably, when agents have access to historical states, recent prices, and their own past actions, super-competitive behavior becomes significantly more frequent and persistent, underscoring the critical role of memory in enhancing execution efficiency.
📝 Abstract
In this paper, we investigate whether deep reinforcement-learning agents interacting in a shared optimal-execution environment can sustain supra-competitive outcomes, in the sense of achieving lower implementation shortfalls than the relevant game-theoretical competitive benchmark. We study a two-agent Almgren-Chriss liquidation game and examine how learned behavior depends on intra-episode environment feedback, the ability to interpret the mid-price and the agent's knoledge of the past. We first use ex-ante schedule-learning agents to remove intra-episode feedback and isolate what can arise when agents commit to complete liquidation trajectories before execution begins. We then allow agents to condition on the evolving state using a variety of DDQN architectures. We find that, when agents are given access to intra-episode history, especially recent prices and own past actions, supra-competitive outcomes become substantially more frequent and more persistent. These findings indicate that supra-competitive behavior in this execution game is driven not by multi-agent learning or by current price observation alone, but by feedback, memory, and state-contingent interaction along the realized execution path.