Tac-DINO: Learning Vision-Tactile Features with Patch Alignment

๐Ÿ“… 2026-06-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the lack of research on alignment mechanisms between visual and tactile signals across local-to-global scales, as well as the absence of high-quality datasets and evaluation benchmarks in tactile learning. To bridge this gap, the authors introduce a large-scale tactile dataset comprising 505 real-world objects and over 20,000 physical interactions, along with the first visionโ€“tactile holistic matching benchmark. They further propose Vision-Tactile Patch Alignment (VTPA), a novel method that leverages patch-level image alignment to model the localized nature of tactile contact. Experimental results demonstrate that VTPA significantly outperforms baseline approaches employing no alignment or whole-image alignment, thereby validating the efficacy of local alignment strategies for cross-modal correspondence.
๐Ÿ“ Abstract
Touch is the primary medium through which humans interact with the environment. Currently, tactile learning mainly focuses on image-level pretraining or alignment. However, tactile signals correspond to local object contact, while research into scale alignment and holographic matching remains limited and proper datasets and benchmarks also lack. To bridge this gap, we first construct a data collection system to acquire a large-scale tactile dataset, with over 20 K tactile contacts from 505 real-world objects. Building on this dataset, we design a Vis-Tac Holographic Matching Benchmark to evaluate vision-tactile local-to-global alignment ability. Then we propose Vision-Tactile Patch Alignment (VTPA) methods for vision-tactile representation learning. Experiments demonstrate that these exceed the performance of methods without alignment and align with whole-object images.
Problem

Research questions and friction points this paper is trying to address.

tactile learning
vision-tactile alignment
patch alignment
holographic matching
tactile dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Tactile Alignment
Patch-level Representation
Tactile Dataset
Holographic Matching Benchmark
Multimodal Learning
๐Ÿ”Ž Similar Papers
No similar papers found.