🤖 AI Summary
To address the challenges of physical tactile sensor deployment—namely, difficulty in integration, high cost, and poor robustness—in robotic dexterous manipulation, this paper proposes a vision-guided pseudo-tactile generation paradigm: high-fidelity reconstruction of surface tactile responses from a single depth image. Methodologically, we introduce the first large-scale, aligned vision–tactile paired dataset (publicly released), design a cross-modal embedding network that maps depth images into a low-dimensional tactile signal space, and incorporate a self-supervised decoding architecture guided by geometry-driven stochastic contact simulation. Experiments demonstrate that our approach achieves 84% object recognition accuracy with only ten contacts and improves grasping stability prediction by 32 percentage points over point-cloud-based baselines. All code, models, and data are open-sourced.
📝 Abstract
Tactile sensing is vital for human dexterous manipulation, however, it has not been widely used in robotics. Compact, low-cost sensing platforms can facilitate a change, but unlike their popular optical counterparts, they are difficult to deploy in high-fidelity tasks due to their low signal dimensionality and lack of a simulation model. To overcome these challenges, we introduce PseudoTouch which links high-dimensional structural information to low-dimensional sensor signals. It does so by learning a low-dimensional visual-tactile embedding, wherein we encode a depth patch from which we decode the tactile signal. We collect and train PseudoTouch on a dataset comprising aligned tactile and visual data pairs obtained through random touching of eight basic geometric shapes. We demonstrate the utility of our trained PseudoTouch model in two downstream tasks: object recognition and grasp stability prediction. In the object recognition task, we evaluate the learned embedding's performance on a set of five basic geometric shapes and five household objects. Using PseudoTouch, we achieve an object recognition accuracy 84% after just ten touches, surpassing a proprioception baseline. For the grasp stability task, we use ACRONYM labels to train and evaluate a grasp success predictor using PseudoTouch's predictions derived from virtual depth information. Our approach yields a 32% absolute improvement in accuracy compared to the baseline relying on partial point cloud data. We make the data, code, and trained models publicly available at https://pseudotouch.cs.uni-freiburg.de.