Improved Offline Reinforcement Learning via Quantum Metric Encoding

๐Ÿ“… 2025-11-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the performance limitations of offline reinforcement learning (RL) in few-shot settings, this paper proposes Quantum Metric Encoding (QME): inspired by quantum circuits, QME introduces a trainable unitary embedding module that compactly and geometrically encodes states while explicitly reducing the ฮ”-hyperbolicity of the state spaceโ€”thereby enhancing policy generalization. QME is compatible with both classical computation and near-term quantum hardware and integrates seamlessly into mainstream offline RL algorithms such as Soft Actor-Critic (SAC) and Implicit Q-Learning (IQL). On three benchmark datasets containing only ~100 trajectories each, QME boosts the maximum rewards of SAC and IQL by 116.2% and 117.6% on average, respectively, significantly advancing few-shot offline RL. Its core contribution lies in being the first to introduce unitary-transformation-driven geometric regularization into offline RL, enabling efficient, interpretable, and hardware-friendly state representation learning.

Technology Category

Application Category

๐Ÿ“ Abstract
Reinforcement learning (RL) with limited samples is common in real-world applications. However, offline RL performance under this constraint is often suboptimal. We consider an alternative approach to dealing with limited samples by introducing the Quantum Metric Encoder (QME). In this methodology, instead of applying the RL framework directly on the original states and rewards, we embed the states into a more compact and meaningful representation, where the structure of the encoding is inspired by quantum circuits. For classical data, QME is a classically simulable, trainable unitary embedding and thus serves as a quantum-inspired module, on a classical device. For quantum data in the form of quantum states, QME can be implemented directly on quantum hardware, allowing for training without measurement or re-encoding. We evaluated QME on three datasets, each limited to 100 samples. We use Soft-Actor-Critic (SAC) and Implicit-Q-Learning (IQL), two well-known RL algorithms, to demonstrate the effectiveness of our approach. From the experimental results, we find that training offline RL agents on QME-embedded states with decoded rewards yields significantly better performance than training on the original states and rewards. On average across the three datasets, for maximum reward performance, we achieve a 116.2% improvement for SAC and 117.6% for IQL. We further investigate the $Delta$-hyperbolicity of our framework, a geometric property of the state space known to be important for the RL training efficacy. The QME-embedded states exhibit low $Delta$-hyperbolicity, suggesting that the improvement after embedding arises from the modified geometry of the state space induced by QME. Thus, the low $Delta$-hyperbolicity and the corresponding effectiveness of QME could provide valuable information for developing efficient offline RL methods under limited-sample conditions.
Problem

Research questions and friction points this paper is trying to address.

Offline reinforcement learning performs poorly with limited sample sizes
Classical state representations lack compactness for effective RL training
Existing methods struggle with geometric properties of state spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum-inspired metric encoder for state embedding
Classically simulable unitary embedding for classical data
Direct quantum hardware implementation for quantum data
๐Ÿ”Ž Similar Papers
No similar papers found.
O
Outongyi Lv
School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
Y
Yewei Yuan
Global College, Shanghai Jiao Tong University, Shanghai 200240, China
Nana Liu
Nana Liu
Shanghai Jiao Tong University
Quantum computationcontinuous variablesscientific computingmetrologyquantum machine learning