Quantum Policy Gradient in Reproducing Kernel Hilbert Space

📅 2024-11-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low efficiency in policy representation and optimization within quantum reinforcement learning (QRL). To this end, it introduces quantum kernel methods—novel in QRL—for policy modeling, constructing both parametric and nonparametric policy gradients and Actor-Critic algorithms within a reproducing kernel Hilbert space (RKHS), explicitly supporting vector-valued action spaces. It derives closed-form quantum policy gradients and designs stochastic and deterministic quantum Actor-Critic frameworks. Theoretically, all proposed algorithms achieve a quadratic speedup in query complexity over classical counterparts—improving from $O(1/varepsilon^2)$ to $O(1/varepsilon)$—under quantum environments. Moreover, under mild assumptions including gradient smoothness, both Actor-Critic variants further enhance convergence rates. The core contributions lie in a unified framework integrating quantum kernel-based policy representation, analytical gradient derivation, and provable optimality in query complexity.

Technology Category

Application Category

📝 Abstract
Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum RL. This paper proposes parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including available analytic forms for the gradient of the policy and tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. Two actor-critic algorithms, one based on stochastic policy gradient and one based on deterministic policy gradient (comparable to the popular DDPG algorithm), demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.
Problem

Research questions and friction points this paper is trying to address.

Quantum policy gradient in RKHS
Quantum kernel methods in RL
Quadratic query complexity reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum kernel policies in RL
Numerical and analytical gradient techniques
Quadratic reduction in query complexity
🔎 Similar Papers
No similar papers found.
D
David M. Bossens
Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR); Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR)
Kishor Bharti
Kishor Bharti
IHPC@A*STAR; Past: QuICS, JQI, NIST, CQT
Quantum Computation
Jayne Thompson
Jayne Thompson
Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR)