HapTile: A Haptic-Informed Vision-Tactile-Language-Action Dataset for Contact-Rich Imitation Learning

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the critical limitation of existing vision-language-action datasets—their lack of tactile and force feedback—which hinders high-fidelity imitation learning for contact-rich manipulation tasks. To bridge this gap, the authors present a novel multimodal dataset that integrates visual, tactile, linguistic, and action modalities, featuring real-time force feedback during teleoperation for the first time. Leveraging custom tactile sensors, a force-feedback teleoperation controller, and a standardized robotic platform, the system enables truly contact-aware data collection. The released large-scale dataset encompasses diverse contact-intensive tasks such as grasping, folding, pressing, and stacking. Baseline experiments demonstrate that incorporating tactile information significantly enhances policy learning performance in these tasks.
📝 Abstract
Despite the importance of tactile sensing for reliable manipulation, most existing Vision-Language-Action (VLA) datasets remain vision-only, and those that do incorporate tactile information typically lack the joint combination of task diversity, language conditioning, and action trajectories. Furthermore, existing teleoperation pipelines rarely provide haptic feedback to the operator, despite its established role in demonstration quality and manipulation stability. In this work, we present HapTile, a contact-grounded visuotactile manipulation dataset that advances beyond vision-only trajectory datasets by embedding physical interaction sensing at two levels: fingertip tactile feedback at the robot end-effector, and haptic-informed demonstrations at the teleoperator side. The data collection platform integrates haptic feedback directly into the teleoperation controller, enabling the operator to perceive contact interactions in real time. It is built around a standard and reproducible robotic system equipped with custom-designed fingertip tactile sensors. The dataset comprises everyday manipulation tasks spanning a broad range of contact-rich skills, including pick-and-place, folding, pressing, stacking, and other routine activities. Each task is paired with language instructions that condition the policy on the manipulation objective, together with synchronized visuotactile observations and action trajectories. In addition, we provide a benchmarking study on contact-rich policy learning using two baseline models to evaluate the effectiveness of the proposed contact-grounded dataset. The dataset and additional details are available on our website: haptile-dataset.github.io.
Problem

Research questions and friction points this paper is trying to address.

tactile sensing
vision-language-action dataset
haptic feedback
contact-rich manipulation
imitation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

haptic feedback
visuotactile sensing
contact-rich manipulation
imitation learning
teleoperation