Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the vulnerability of off-policy maximum-entropy reinforcement learning methods, such as Soft Actor-Critic (SAC), to noisy state observations in low signal-to-noise ratio financial environments, which induces Q-value estimation bias and leads to a “financial entropy trap.” To mitigate this issue, the authors propose embedding a compact, bounded parameterized quantum circuit (PQC) at the front end of both policy and value networks. The PQC leverages trainable quantum entanglement to constrain representation propagation, thereby stabilizing Bellman target estimation at its source while preserving cross-asset interaction capabilities. This plug-and-play design circumvents the limitations of conventional post-hoc regularization or input filtering techniques. Empirical results demonstrate that the proposed approach achieves a 66.89% improvement in cumulative returns over standard SAC and outperforms the best continuous-control deep reinforcement learning baseline by approximately 27%, significantly enhancing out-of-sample profitability and stability in real-world portfolio management tasks.

📝 Abstract

The financial market is a typical low signal-to-noise ratio (SNR) setting, which often destabilizes off-policy maximum-entropy methods like Soft Actor-Critic (SAC). Specifically, noisy state representations may produce unreliable Q-value estimates, and bootstrapping amplifies these errors, forming a failure mode we call the "Financial Entropy Trap". In this paper, we propose FPQC-SAC, an efficient and plug-and-play SAC variant that places a compact and bounded Parameterized Quantum Circuit (PQC) before the actor and critic networks to constrain feature propagation at the representation level, rather than filtering raw inputs or regularizing Q-values after bootstrapping. Notably, FPQC-SAC reduces the impact of extreme market fluctuations on Bellman target estimation, while trainable quantum entanglement preserves flexible cross-asset interactions. Empirical evaluations on real-world portfolio management tasks demonstrate that FPQC-SAC substantially enhances out-of-sample stability and cumulative returns by achieving a 66.89% relative gain in cumulative return over standard unconstrained SAC and outperforms the best continuous-control deep reinforcement learning baseline by approximately 27%. Open-source code is available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main.

Problem

Research questions and friction points this paper is trying to address.

low-SNR

financial reinforcement learning

bias mitigation

entropy trap

Q-value estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum Representation

Low-SNR Reinforcement Learning

Financial Entropy Trap