Efficient and Uncertainty-Aware Diffusion Framework for Offline-to-Online Reinforcement Learning

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the performance degradation in offline-to-online reinforcement learning caused by distributional shift and proposes DUAL, a novel framework that uniquely integrates prior knowledge from diffusion models with an uncertainty-aware mechanism. During the offline phase, DUAL distills an efficient action policy and transition model using a diffusion model; in the online phase, it balances exploration and exploitation by combining Laplace approximation with state-transition distance–based uncertainty estimation. Empirical results demonstrate that DUAL consistently achieves significantly higher expected returns across diverse environments and settings, outperforming existing offline-to-online reinforcement learning baselines.

📝 Abstract

Offline-to-Online Reinforcement Learning (O2O-RL) leverages an offline, pre-trained policy to minimize costly online interactions. Although data-efficient, O2O-RL is susceptible to shifts between offline and online distributions. Existing work aims to mitigate the harm of this shift by finetuning the policy on trajectory data sampled from a diffusion model. Inspired by this line of work, we propose DUAL: an efficient \textbf{D}iffusion \textbf{U}ncertainty-\textbf{A}ware framework for offline-to-online reinforcement \textbf{L}earning. DUAL utilizes the prior knowledge of the diffusion model to distill a fast-sampling diffusion actor policy and transition model in the offline phase. DUAL also employs a Laplace approximation and distance transition-state-shift detection, thereby using uncertainty quantification to improve exploration versus exploitation in the online phase. We formally show that our actor loss with the Laplace approximation provides a proxy for a principled estimate of epistemic uncertainty. Empirically, DUAL improves the online expected return over O2O-RL baselines across multiple settings and environments.

Problem

Research questions and friction points this paper is trying to address.

Offline-to-Online Reinforcement Learning

distribution shift

epistemic uncertainty

data efficiency

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Model

Uncertainty Quantification

Offline-to-Online RL