RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

High-quality robot trajectory data are scarce, and human teleoperation is costly, severely limiting the performance of vision-language-action (VLA) models. To address this challenge, this work proposes RDGen, a novel framework that repurposes sim-to-real reinforcement learning policies as structured trajectory generators rather than final control policies. By integrating task parsing from vision-language models with object localization via Grounding DINO, RDGen efficiently produces smooth, high-success demonstration trajectories on real robots. Experimental results demonstrate that the generated data substantially enhance the performance of downstream VLA models, establishing a scalable paradigm for robotic imitation learning.

📝 Abstract

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robot control. However, their performance remains fundamentally constrained by the availability of high-quality robot trajectory data. In current robot learning practice, such data are primarily collected through human teleoperation, which is labor-intensive, costly, and difficult to scale. In this paper, we propose RDGen, a sim-to-real reinforcement learning framework for generating high-quality robot demonstrations. Rather than employing reinforcement learning solely as the final control policy, RDGen leverages trained RL policies as a structured trajectory generator. The system consists of a VLM-based task parser that identifies task-relevant objects, a Grounding DINO-based object localizer, and an RL policy transferred from simulation to the real robot. Successful rollouts are then harvested as clean, high-quality demonstrations for downstream VLA training, while the simulation stage further provides a scalable source of additional trajectories at little marginal cost. Experiments on a pick-and-place task demonstrate that the transferred RL policy achieves a high task success rate. Compared with human teleoperation, RDGen produces significantly smoother trajectories and yields superior downstream VLA performance. These results indicate that RL-generated demonstrations can serve as more reliable and consistent supervisory signals for robot policy learning.

Problem

Research questions and friction points this paper is trying to address.

robot learning

demonstration generation

high-quality trajectories

human teleoperation

data scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Demonstration Generation

Reinforcement Learning

Vision-Language-Action Models

Sim-to-Real Transfer

Robot Learning

🔎 Similar Papers

No similar papers found.