Deep reinforcement learning for optimal trading with partial information

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the design of optimal trading strategies under incomplete information, leveraging a regime-switching Ornstein–Uhlenbeck signal. We propose three deep reinforcement learning frameworks integrating Gated Recurrent Units (GRUs) and Deep Deterministic Policy Gradient (DDPG), uniquely embedding probabilistic regime estimation into the decision-making process. Specifically, GRUs model the temporal dynamics of the signal, while a posterior probability estimation module—coupled with a signal prediction component—enables multi-stage information fusion. Our key contribution is replacing deterministic latent-state encoding with a probabilistic regime representation, thereby enhancing both interpretability and robustness of trading policies. Empirical evaluation demonstrates that the proposed Prob-DDPG algorithm significantly outperforms benchmark methods in regime-switching markets: it achieves higher cumulative returns and lower maximum drawdown, validating the efficacy of explicit latent-regime modeling for adaptive trading decisions.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning (RL) applied to financial problems has been the subject of a lively area of research. The use of RL for optimal trading strategies that exploit latent information in the market is, to the best of our knowledge, not widely tackled. In this paper we study an optimal trading problem, where a trading signal follows an Ornstein-Uhlenbeck process with regime-switching dynamics. We employ a blend of RL and Recurrent Neural Networks (RNN) in order to make the most at extracting underlying information from the trading signal with latent parameters. The latent parameters driving mean reversion, speed, and volatility are filtered from observations of the signal, and trading strategies are derived via RL. To address this problem, we propose three Deep Deterministic Policy Gradient (DDPG)-based algorithms that integrate Gated Recurrent Unit (GRU) networks to capture temporal dependencies in the signal. The first, a one -step approach (hid-DDPG), directly encodes hidden states from the GRU into the RL trader. The second and third are two-step methods: one (prob-DDPG) makes use of posterior regime probability estimates, while the other (reg-DDPG) relies on forecasts of the next signal value. Through extensive simulations with increasingly complex Markovian regime dynamics for the trading signal's parameters, as well as an empirical application to equity pair trading, we find that prob-DDPG achieves superior cumulative rewards and exhibits more interpretable strategies. By contrast, reg-DDPG provides limited benefits, while hid-DDPG offers intermediate performance with less interpretable strategies. Our results show that the quality and structure of the information supplied to the agent are crucial: embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies.
Problem

Research questions and friction points this paper is trying to address.

Developing optimal trading strategies using latent market information
Filtering hidden parameters from trading signals with regime-switching dynamics
Integrating recurrent networks with reinforcement learning for temporal dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Deterministic Policy Gradient algorithms for trading
Gated Recurrent Unit networks capture temporal dependencies
Filtering latent parameters from partial market information
🔎 Similar Papers
No similar papers found.