Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

📅 2024-06-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited generalization and robustness of vision-only manipulation policies in complex embodied tasks. We propose a low-cost tactile-augmented visuo-tactile pretraining paradigm. Methodologically, we employ the open-source, low-fidelity BeadSight tactile sensor to capture coarse-grained tactile signals, then perform multi-task imitation learning with a shared encoder architecture across tasks. Crucially, downstream execution requires only visual input—no runtime tactile feedback is needed. Our key contribution is the first empirical demonstration that pretraining with low-precision tactile signals alone significantly enhances vision-only policy performance: success rates improve by 65% on USB insertion/removal, with consistent gains on cross-domain tasks including drawer opening and long-horizon object placement. This establishes a novel pathway toward cost-effective, robust embodied intelligence.

Technology Category

Application Category

📝 Abstract

Tactile perception is essential for real-world manipulation tasks, yet the high cost and fragility of tactile sensors can limit their practicality. In this work, we explore BeadSight (a low-cost, open-source tactile sensor) alongside a tactile pre-training approach, an alternative method to precise, pre-calibrated sensors. By pre-training with the tactile sensor and then disabling it during downstream tasks, we aim to enhance robustness and reduce costs in manipulation systems. We investigate whether tactile pre-training, even with a low-fidelity sensor like BeadSight, can improve the performance of an imitation learning agent on complex manipulation tasks. Through visuo-tactile pre-training on both similar and dissimilar tasks, we analyze its impact on a longer-horizon downstream task. Our experiments show that visuo-tactile pre-training improved performance on a USB cable plugging task by up to 65% with vision-only inference. Additionally, on a longer-horizon drawer pick-and-place task, pre-training--whether on a similar, dissimilar, or identical task--consistently improved performance, highlighting the potential for a large-scale visuo-tactile pre-trained encoder.

Problem

Research questions and friction points this paper is trying to address.

Explores low-cost tactile sensors for manipulation tasks.

Investigates tactile pre-training to enhance vision-only manipulation.

Analyzes impact of visuo-tactile pre-training on complex tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-cost tactile sensor BeadSight used

Visuo-tactile pre-training enhances manipulation performance

Pre-trained encoder improves vision-only task efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow