🤖 AI Summary
This work addresses the distributional mismatch problem in cross-domain offline reinforcement learning, particularly when target-domain data is extremely scarce due to differences in environment dynamics. To tackle this challenge, the paper proposes the Target-aligned Coverage Expansion (TCE) framework, which integrates distribution alignment theory with a dual-scoring generative model to dynamically select or synthesize state transitions consistent with the target domain. By expanding effective state coverage and guiding policy optimization accordingly, TCE significantly outperforms existing offline reinforcement learning baselines across multiple cross-domain tasks. The results demonstrate its effectiveness and robustness under severe data scarcity, as well as the novelty of its theoretically grounded mechanism for leveraging source-domain data.
📝 Abstract
Cross-domain offline reinforcement learning aims to adapt a policy from a source domain to a target domain using only pre-collected datasets, where environment dynamics may differ. A key challenge is to leverage source data while reducing distributional mismatch, particularly when the target dataset is extremely limited. To address this, we propose Target-aligned Coverage Expansion (TCE), a framework that decides how source data should be used, either by directly incorporating target-near transitions or by expanding state coverage through target-aligned generation, guided by theoretical analysis. TCE builds on a dual score-based generative model to synthesize target-consistent transitions over an expanded state region. Extensive experiments across diverse cross-domain environments show that TCE consistently outperforms state-of-the-art cross-domain offline RL baselines.