Pretraining a Shared Q-Network for Data-Efficient Offline Reinforcement Learning

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Offline reinforcement learning (RL) suffers from low sample efficiency due to the limited scale and high acquisition cost of static offline datasets. To address this, we propose a plug-and-play Q-network pretraining framework: a shared-weight architecture jointly regresses both next-state predictions and Q-values in a supervised manner, enabling effective pretraining without altering downstream algorithms. Our method seamlessly integrates with state-of-the-art offline RL algorithms—including CQL and TD3+BC—without architectural or training modifications. Evaluated on D4RL, Robomimic, V-D4RL, and ExoRL benchmarks, our approach achieves superior performance using only 10% of the original dataset, consistently outperforming full-dataset baselines—especially under low-quality and sparsely distributed data regimes. It significantly alleviates the data-efficiency bottleneck in offline RL. The proposed paradigm offers a general, efficient, and scalable pretraining solution for few-shot offline RL.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning (RL) aims to learn a policy from a static dataset without further interactions with the environment. Collecting sufficiently large datasets for offline RL is exhausting since this data collection requires colossus interactions with environments and becomes tricky when the interaction with the environment is restricted. Hence, how an agent learns the best policy with a minimal static dataset is a crucial issue in offline RL, similar to the sample efficiency problem in online RL. In this paper, we propose a simple yet effective plug-and-play pretraining method to initialize a feature of a $Q$-network to enhance data efficiency in offline RL. Specifically, we introduce a shared $Q$-network structure that outputs predictions of the next state and $Q$-value. We pretrain the shared $Q$-network through a supervised regression task that predicts a next state and trains the shared $Q$-network using diverse offline RL methods. Through extensive experiments, we empirically demonstrate that our method enhances the performance of existing popular offline RL methods on the D4RL, Robomimic and V-D4RL benchmarks. Furthermore, we show that our method significantly boosts data-efficient offline RL across various data qualities and data distributions trough D4RL and ExoRL benchmarks. Notably, our method adapted with only 10% of the dataset outperforms standard algorithms even with full datasets.

Problem

Research questions and friction points this paper is trying to address.

Enhancing data efficiency in offline reinforcement learning

Learning optimal policies with minimal static datasets

Improving performance across diverse data qualities and distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrains shared Q-network for offline RL

Predicts next state and Q-value jointly

Enhances performance with minimal datasets

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning