🤖 AI Summary
This work addresses the optimization of temporal window length—a critical hyperparameter in financial deep reinforcement learning. We explicitly model the CNN input time window (2–12 weeks) as a tunable hyperparameter and systematically evaluate its impact on trading strategy performance within PPO and A2C frameworks. Methodologically, we introduce a novel company-grouped, multi-granularity feature reordering mechanism to uncover synergistic effects between temporal scale and data structure. Empirical results demonstrate that longer windows significantly enhance risk-adjusted returns. The findings are validated across two complementary datasets comprising the Dow Jones Industrial Average’s 30 constituent stocks, confirming robustness. Our proposed strategy consistently outperforms mainstream financial products—including Global X Guru ETFs—achieving statistically significant annualized alpha. This study provides both methodological insight into temporal representation learning and practical guidance for hyperparameter tuning in algorithmic trading systems.
📝 Abstract
This paper investigates the optimization of temporal windows in Financial Deep Reinforcement Learning (DRL) models using 2D Convolutional Neural Networks (CNNs). We introduce a novel approach to treating the temporal field as a hyperparameter and examine its impact on model performance across various datasets and feature arrangements. We introduce a new hyperparameter for the CNN policy, proposing that this temporal field can and should be treated as a hyperparameter for these models. We examine the significance of this temporal field by iteratively expanding the window of observations presented to the CNN policy during the deep reinforcement learning process. Our iterative process involves progressively increasing the observation period from two weeks to twelve weeks, allowing us to examine the effects of different temporal windows on the model's performance. This window expansion is implemented in two settings. In one setting, we rearrange the features in the dataset to group them by company, allowing the model to have a full view of company data in its observation window and CNN kernel. In the second setting, we do not group the features by company, and features are arranged by category. Our study reveals that shorter temporal windows are most effective when no feature rearrangement to group per company is in effect. However, the model will utilize longer temporal windows and yield better performance once we introduce the feature rearrangement. To examine the consistency of our findings, we repeated our experiment on two datasets containing the same thirty companies from the Dow Jones Index but with different features in each dataset and consistently observed the above-mentioned patterns. The result is a trading model significantly outperforming global financial services firms such as the Global X Guru by the established Mirae Asset.