Reinforcement Learning from Cross-domain Videos with Video Prediction Model

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of learning from expert videos in reinforcement learning settings where no reward signal is available and significant visual domain discrepancies—such as differences in color, morphology, or simulation-to-reality gaps—exist. To tackle this, the authors propose XIPER, a novel method that leverages a cross-domain video prediction model to map the agent’s observations into the expert video domain and constructs an intrinsic reward based on prediction likelihood to guide policy learning. XIPER effectively bridges the appearance-induced domain gap, enabling policies trained solely with simulated expert videos to transfer successfully to real-world robotic tasks. Experimental results demonstrate that XIPER substantially outperforms existing baselines across 11 tasks in the DMC Color Suite and Body Suite, and successfully generates meaningful reward signals for sim-to-real transfer.

📝 Abstract

Reinforcement learning from expert videos across visually distinct domains is challenging due to the absence of reward signals and the presence of domain gaps. We introduce XIPER (Cross-domain Video Prediction Reward), a reward model for learning from expert videos collected in a visually different domain, where the agent's appearance differs due to factors such as color, morphology, or the sim-to-real gap. More specifically, XIPER trains a cross-domain video prediction model that maps agent observations into the expert domain and uses the prediction likelihood as a reward signal. Experiments on the DMC Color Suite (8 tasks) and DMC Body Suite (3 tasks) show that XIPER consistently outperforms baselines despite domain gaps such as differences in agent color and morphology. We further analyze XIPER on a sim-to-real transfer dataset, demonstrating that it produces meaningful reward signals for real-robot observations given only simulated expert videos. Code, pretrained models, datasets and video demonstrations can be found on our project webpage: https://sites.google.com/view/xiper

Problem

Research questions and friction points this paper is trying to address.

reinforcement learning

cross-domain

video prediction

domain gap

expert videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-domain reinforcement learning

video prediction model

reward modeling