🤖 AI Summary
This paper addresses the design of optimal trading strategies under incomplete information, leveraging a regime-switching Ornstein–Uhlenbeck signal. We propose three deep reinforcement learning frameworks integrating Gated Recurrent Units (GRUs) and Deep Deterministic Policy Gradient (DDPG), uniquely embedding probabilistic regime estimation into the decision-making process. Specifically, GRUs model the temporal dynamics of the signal, while a posterior probability estimation module—coupled with a signal prediction component—enables multi-stage information fusion. Our key contribution is replacing deterministic latent-state encoding with a probabilistic regime representation, thereby enhancing both interpretability and robustness of trading policies. Empirical evaluation demonstrates that the proposed Prob-DDPG algorithm significantly outperforms benchmark methods in regime-switching markets: it achieves higher cumulative returns and lower maximum drawdown, validating the efficacy of explicit latent-regime modeling for adaptive trading decisions.
📝 Abstract
Reinforcement Learning (RL) applied to financial problems has been the subject of a lively area of research. The use of RL for optimal trading strategies that exploit latent information in the market is, to the best of our knowledge, not widely tackled. In this paper we study an optimal trading problem, where a trading signal follows an Ornstein-Uhlenbeck process with regime-switching dynamics. We employ a blend of RL and Recurrent Neural Networks (RNN) in order to make the most at extracting underlying information from the trading signal with latent parameters.
The latent parameters driving mean reversion, speed, and volatility are filtered from observations of the signal, and trading strategies are derived via RL. To address this problem, we propose three Deep Deterministic Policy Gradient (DDPG)-based algorithms that integrate Gated Recurrent Unit (GRU) networks to capture temporal dependencies in the signal. The first, a one -step approach (hid-DDPG), directly encodes hidden states from the GRU into the RL trader. The second and third are two-step methods: one (prob-DDPG) makes use of posterior regime probability estimates, while the other (reg-DDPG) relies on forecasts of the next signal value. Through extensive simulations with increasingly complex Markovian regime dynamics for the trading signal's parameters, as well as an empirical application to equity pair trading, we find that prob-DDPG achieves superior cumulative rewards and exhibits more interpretable strategies. By contrast, reg-DDPG provides limited benefits, while hid-DDPG offers intermediate performance with less interpretable strategies. Our results show that the quality and structure of the information supplied to the agent are crucial: embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies.