🤖 AI Summary
This work addresses the performance limitations of vision-language-action (VLA) models in contact-intensive tasks due to their lack of fine-grained tactile perception. To overcome this, the authors propose a scalable framework that integrates tactile feedback through a realistically aligned closed-loop simulator, eliminating the need for large-scale tactile pretraining or extensive real-world exploration. The approach combines hybrid sim-to-real trajectory warmstarting, a tactile modulation mechanism, and reinforcement learning guided by validation-reward signals to enhance policy robustness and distributional consistency. Notably, the method enables zero-shot transfer to real-world settings without online fine-tuning. Evaluated on four dual-arm contact-intensive tasks, it achieves an average success rate of 72.5%, substantially outperforming the 50.0% baseline.
📝 Abstract
Vision-language-action (VLA) models provide strong visual, language, and action priors for robot manipulation, but visual observations alone often miss the local contact state required for contact-rich tasks. We present TacCoRL, a scalable framework that injects Tactile feedback into VLA policies and improves them through sim-real Co-training and simulation-based reinforcement learning (RL), without requiring large-scale tactile pretraining or extensive real-world contact exploration. The key idea is not only adding touch as an input, but learning how contact readings should modulate action responses in near-failure states that are rare in demonstrations and risky to collect on hardware. We use a real-aligned simulator as a closed-loop training environment for contact interaction. Mixed simulated and real trajectories first warm-start tactile-conditioned actions in the pretrained policy. Reinforcement learning with verifiable task rewards then optimizes the policy using simulated contact rollouts. It reinforces tactile-conditioned actions that lead to task completion, while a supervised objective on real trajectories keeps the refined policy anchored to deployment visual, tactile, and action distributions. The resulting policy transfers directly to the real robot without privileged simulation state or online real-world RL. Across four bimanual contact-rich tasks, the final visuo-tactile policy achieves an average success rate of 72.5%, compared to baseline of 50.0%. Result videos and more details are available at https://tac-corl.github.io/