Bridging Domain Gaps with Target-Aligned Generation for Offline Reinforcement Learning

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work addresses the distributional mismatch problem in cross-domain offline reinforcement learning, particularly when target-domain data is extremely scarce due to differences in environment dynamics. To tackle this challenge, the paper proposes the Target-aligned Coverage Expansion (TCE) framework, which integrates distribution alignment theory with a dual-scoring generative model to dynamically select or synthesize state transitions consistent with the target domain. By expanding effective state coverage and guiding policy optimization accordingly, TCE significantly outperforms existing offline reinforcement learning baselines across multiple cross-domain tasks. The results demonstrate its effectiveness and robustness under severe data scarcity, as well as the novelty of its theoretically grounded mechanism for leveraging source-domain data.
📝 Abstract
Cross-domain offline reinforcement learning aims to adapt a policy from a source domain to a target domain using only pre-collected datasets, where environment dynamics may differ. A key challenge is to leverage source data while reducing distributional mismatch, particularly when the target dataset is extremely limited. To address this, we propose Target-aligned Coverage Expansion (TCE), a framework that decides how source data should be used, either by directly incorporating target-near transitions or by expanding state coverage through target-aligned generation, guided by theoretical analysis. TCE builds on a dual score-based generative model to synthesize target-consistent transitions over an expanded state region. Extensive experiments across diverse cross-domain environments show that TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.
Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning
cross-domain adaptation
distributional mismatch
domain gap
target-aligned generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

target-aligned generation
offline reinforcement learning
cross-domain adaptation
score-based generative model
distributional mismatch
🔎 Similar Papers
No similar papers found.