🤖 AI Summary
To address the reliance on extensive data augmentation and large models in robot Sim2Real transfer, this paper proposes a zero-shot visual reinforcement learning framework. It integrates 3D Gaussian Splatting (GS) into the simulation pipeline, establishing a closed loop: “real-world scene → high-fidelity GS rendering → synchronized physical interaction → real-robot deployment.” We introduce mesh-constrained soft-binding GS modeling and a GS–physics engine co-editing mechanism—enabling training-free visual–action alignment. This significantly improves rendering fidelity for unstructured objects and task generalization. In real-world grasping and pick-and-place tasks, success rates exceed 85%. Quantitatively, PSNR and SSIM improve by 3.2 dB and 0.042, respectively, effectively suppressing floating-point artifacts and motion blur.
📝 Abstract
Sim-to-Real refers to the process of transferring policies learned in simulation to the real world, which is crucial for achieving practical robotics applications. However, recent Sim2real methods either rely on a large amount of augmented data or large learning models, which is inefficient for specific tasks. In recent years, with the emergence of radiance field reconstruction methods, especially 3D Gaussian splatting, it has become possible to construct realistic real-world scenes. To this end, we propose RL-GSBridge, a novel real-to-sim-to-real framework which incorporates 3D Gaussian Splatting into the conventional RL simulation pipeline, enabling zero-shot sim-to-real transfer for vision-based deep reinforcement learning. We introduce a mesh-based 3D GS method with soft binding constraints, enhancing the rendering quality of mesh models. Then utilizing a GS editing approach to synchronize the rendering with the physics simulator, RL-GSBridge could reflect the visual interactions of the physical robot accurately. Through a series of sim-to-real experiments, including grasping and pick-and-place tasks, we demonstrate that RL-GSBridge maintains a satisfactory success rate in real-world task completion during sim-to-real transfer. Furthermore, a series of rendering metrics and visualization results indicate that our proposed mesh-based 3D GS reduces artifacts in unstructured objects, demonstrating more realistic rendering performance.