A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich Manipulation

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional tactile sensors in contact-intensive manipulation—namely, their narrow sensing range, low reliability, and high cost—by introducing LVTG, a low-cost, vision-based tactile-integrated gripper. The design incorporates a wide opening angle, a highly wear-resistant skin, and a modular architecture, and for the first time integrates a CLIP-inspired contrastive learning strategy to align visual and tactile embeddings across modalities. When combined with the Action Chunking Transformer (ACT) policy network, LVTG significantly outperforms the original ACT approach in tasks such as grasping large, heavy objects, achieving higher task success rates and improved data efficiency. These results validate the effectiveness of the proposed hardware-algorithm co-design paradigm in contact-rich scenarios.

Technology Category

Application Category

📝 Abstract
Robotic manipulation in contact-rich environments remains challenging, particularly when relying on conventional tactile sensors that suffer from limited sensing range, reliability, and cost-effectiveness. In this work, we present LVTG, a low-cost visuo-tactile gripper designed for stable, robust, and efficient physical interaction. Unlike existing visuo-tactile sensors, LVTG enables more effective and stable grasping of larger and heavier everyday objects, thanks to its enhanced tactile sensing area and greater opening angle. Its surface skin is made of highly wear-resistant material, significantly improving durability and extending operational lifespan. The integration of vision and tactile feedback allows LVTG to provide rich, high-fidelity sensory data, facilitating reliable perception during complex manipulation tasks. Furthermore, LVTG features a modular design that supports rapid maintenance and replacement. To effectively fuse vision and touch, We adopt a CLIP-inspired contrastive learning objective to align tactile embeddings with their corresponding visual observations, enabling a shared cross-modal representation space for visuo-tactile perception. This alignment improves the performance of an Action Chunking Transformer (ACT) policy in contact-rich manipulation, leading to more efficient data collection and more effective policy learning. Compared to the original ACT method, the proposed LVTG with pretraining achieves significantly higher success rates in manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

contact-rich manipulation
tactile sensing
robotic manipulation
visuo-tactile perception
low-cost sensors
Innovation

Methods, ideas, or system contributions that make the work stand out.

visuo-tactile sensing
contrastive learning
modular gripper design
contact-rich manipulation
cross-modal representation
🔎 Similar Papers
No similar papers found.
Yaohua Liu
Yaohua Liu
Oak Ridge National Laboratory
Condensed Matter and Materials PhysicsNeutron Instrumentation
B
Binkai Ou
Innovation and Research and Development Department, BoardWare Information System Company Ltd., Macau 999078, China
Z
Zicheng Qiu
School of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210095, China and Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai, 519031, Guangdong, China
Ce Hao
Ce Hao
National University of Singapore
Y
Yemin Wang
H
Hengjun Zhang
School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541000, China