TORL-VLA: Tactile Guided Online Reinforcement Learning for Contact-Rich Manipulation

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited online adaptation capability of existing vision-language-action (VLA) models in contact-intensive manipulation tasks, which often leads to improper contact force control and inefficient retries. To overcome this, the authors propose a tactile-guided online reinforcement learning framework that integrates tactile-perception-informed reference action prediction with lightweight policy optimization. The approach further introduces an intervention-masking critic mechanism to effectively fuse human intervention signals with exploration data, thereby enhancing policy robustness. Experimental results demonstrate that the proposed method significantly improves both subtask and full-task success rates, as well as time-constrained execution efficiency, across challenging tasks such as latch manipulation, coffee cup placement, and egg grasping.
📝 Abstract
Vision-Language-Action (VLA) models have become a powerful framework for robotic manipulation, and recent studies have introduced tactile or force feedback into VLAs to address contact-rich tasks. However, these models are typically deployed as offline policies. When contact conditions shift from the training distribution, the policy cannot perform online adaptation, leading to problems such as inappropriate contact forces and inefficient retries. Therefore, we propose TORL-VLA, a tactile-guided online reinforcement learning framework that couples tactile feedback with policy refinement for contact-rich manipulation. Our method introduces a tactile-derived wrench-aware VLA to predict reference actions and future wrench sequences, while a lightweight online RL module is used to refine the reference actions. To stabilize learning from mixed exploratory policy-generated and human-intervention data, we introduce an intervention-censored critic that prevents post-intervention success from being wrongly credited to policy-generated actions preceding intervention. Real-robot experiments on long-horizon contact-rich tasks, including latch manipulation, coffee-cup placement, and egg handling, show that TORL-VLA improves success rates at both subtask and full-task levels, as well as time-bounded execution efficiency over strong baselines.
Problem

Research questions and friction points this paper is trying to address.

contact-rich manipulation
online adaptation
tactile feedback
offline policy
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

tactile feedback
online reinforcement learning
Vision-Language-Action (VLA)
contact-rich manipulation
intervention-censored critic
H
Huaihang Zheng
Meituan
Y
Yi Yang
Meituan; State Key Lab of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
K
Kai Ma
Meituan
S
Shenglin Xu
Meituan; Beijing Institute of Technology
T
Tian Xie
Meituan
G
Guozheng Li
Meituan; China University of Mining and Technology (Beijing)
Xiangyu Wang
Xiangyu Wang
Professor, Curtin University
Civil EngineeringBuilding Information ModelingSmart CityAutomation and RoboticsSmart
Y
Yiren Ma
Meituan
Si Liu
Si Liu
Fred Hutchinson Cancer Center
GenomicsBiostatisticsAnomaly DetectionOpen Category Detection
Y
Yinian Mao
Meituan
B
Baoxu Liu
Meituan