TORL-VLA: Tactile Guided Online Reinforcement Learning for Contact-Rich Manipulation

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited online adaptation capability of existing vision-language-action (VLA) models in contact-intensive manipulation tasks, which often leads to improper contact force control and inefficient retries. To overcome this, the authors propose a tactile-guided online reinforcement learning framework that integrates tactile-perception-informed reference action prediction with lightweight policy optimization. The approach further introduces an intervention-masking critic mechanism to effectively fuse human intervention signals with exploration data, thereby enhancing policy robustness. Experimental results demonstrate that the proposed method significantly improves both subtask and full-task success rates, as well as time-constrained execution efficiency, across challenging tasks such as latch manipulation, coffee cup placement, and egg grasping.

📝 Abstract

Vision-Language-Action (VLA) models have become a powerful framework for robotic manipulation, and recent studies have introduced tactile or force feedback into VLAs to address contact-rich tasks. However, these models are typically deployed as offline policies. When contact conditions shift from the training distribution, the policy cannot perform online adaptation, leading to problems such as inappropriate contact forces and inefficient retries. Therefore, we propose TORL-VLA, a tactile-guided online reinforcement learning framework that couples tactile feedback with policy refinement for contact-rich manipulation. Our method introduces a tactile-derived wrench-aware VLA to predict reference actions and future wrench sequences, while a lightweight online RL module is used to refine the reference actions. To stabilize learning from mixed exploratory policy-generated and human-intervention data, we introduce an intervention-censored critic that prevents post-intervention success from being wrongly credited to policy-generated actions preceding intervention. Real-robot experiments on long-horizon contact-rich tasks, including latch manipulation, coffee-cup placement, and egg handling, show that TORL-VLA improves success rates at both subtask and full-task levels, as well as time-bounded execution efficiency over strong baselines.

Problem

Research questions and friction points this paper is trying to address.

contact-rich manipulation

online adaptation

tactile feedback

offline policy

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

tactile feedback

online reinforcement learning

Vision-Language-Action (VLA)