A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich Manipulation

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
🤖 AI Summary
This work addresses the limitations of conventional tactile sensors in contact-intensive manipulation—namely, their narrow sensing range, low reliability, and high cost—by introducing LVTG, a low-cost, vision-based tactile-integrated gripper. The design incorporates a wide opening angle, a highly wear-resistant skin, and a modular architecture, and for the first time integrates a CLIP-inspired contrastive learning strategy to align visual and tactile embeddings across modalities. When combined with the Action Chunking Transformer (ACT) policy network, LVTG significantly outperforms the original ACT approach in tasks such as grasping large, heavy objects, achieving higher task success rates and improved data efficiency. These results validate the effectiveness of the proposed hardware-algorithm co-design paradigm in contact-rich scenarios.

Technology Category

Application Category

📝 Abstract
Robotic manipulation in contact-rich environments remains challenging, particularly when relying on conventional tactile sensors that suffer from limited sensing range, reliability, and cost-effectiveness. In this work, we present LVTG, a low-cost visuo-tactile gripper designed for stable, robust, and efficient physical interaction. Unlike existing visuo-tactile sensors, LVTG enables more effective and stable grasping of larger and heavier everyday objects, thanks to its enhanced tactile sensing area and greater opening angle. Its surface skin is made of highly wear-resistant material, significantly improving durability and extending operational lifespan. The integration of vision and tactile feedback allows LVTG to provide rich, high-fidelity sensory data, facilitating reliable perception during complex manipulation tasks. Furthermore, LVTG features a modular design that supports rapid maintenance and replacement. To effectively fuse vision and touch, We adopt a CLIP-inspired contrastive learning objective to align tactile embeddings with their corresponding visual observations, enabling a shared cross-modal representation space for visuo-tactile perception. This alignment improves the performance of an Action Chunking Transformer (ACT) policy in contact-rich manipulation, leading to more efficient data collection and more effective policy learning. Compared to the original ACT method, the proposed LVTG with pretraining achieves significantly higher success rates in manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

contact-rich manipulation
tactile sensing
robotic manipulation
visuo-tactile perception
low-cost sensors
Innovation

Methods, ideas, or system contributions that make the work stand out.

visuo-tactile sensing
contrastive learning
modular gripper design
contact-rich manipulation
cross-modal representation