When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates how high-frequency trading (HFT) agents exploit mid-frequency traders via reinforcement learning–driven adverse selection, exacerbating their slippage costs. Method: We propose a novel RL framework integrating impulse control with an endogenous price-impact Hawkes process to model AI-driven market-making in limit order books; training employs proximal policy optimization (PPO) augmented with self-imitation learning. Contribution/Results: The trained HFT agent accurately detects and profits from price drift induced by meta-orders, empirically validating an AI-enabled adverse selection mechanism. Contrary to conventional wisdom, however, this strategy does not significantly increase the average slippage cost for mid-frequency traders—revealing a nonlinear interplay between informational advantage and market impact in modern electronic markets. The findings challenge prevailing assumptions about the direct relationship between predatory HFT behavior and execution cost degradation.

Technology Category

Application Category

📝 Abstract
We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a medium frequency trader (MFT) executing a meta-order and demonstrate that, with training against the MFT meta-order execution agent, the RL market making agent learns to capitalise on the price drift induced by the meta-order. Recent empirical studies have shown that medium-frequency traders are increasingly subject to adverse selection by high-frequency trading agents. As high-frequency trading continues to proliferate across financial markets, the slippage costs incurred by medium-frequency traders are likely to increase over time. However, we do not observe that increased profits for the market making RL agent necessarily cause significantly increased slippages for the MFT agent.
Problem

Research questions and friction points this paper is trying to address.

Investigating adverse selection of medium-frequency traders by high-frequency agents
Modeling market making behaviors using reinforcement learning in Hawkes framework
Analyzing how RL agents capitalize on meta-order induced price drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning in Hawkes model for market making
Impulse control framework for high-frequency trading strategies
PPO and self-imitation learning to exploit price drift
🔎 Similar Papers
No similar papers found.
A
Ali Raza Jafree
Department of Computer Science, University College London, London, UK
Konark Jain
Konark Jain
PhD Student, University College London
Machine LearningArtificial IntelligenceMathematical ModelsFinancial Mathematics
N
Nick Firoozye
Department of Computer Science, University College London, London, UK