🤖 AI Summary
Existing tactile commonsense reasoning systems struggle to generalize to open-world scenarios due to limited data scale and insufficient modeling of the action-dependence and redundancy inherent in tactile signals. To address this, this work introduces TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset, along with TouchThinker-Bench, a comprehensive open-world evaluation benchmark. Furthermore, we propose an action-aware tactile representation method that leverages a tactile-language fusion framework to explicitly model action context, significantly enhancing both representational efficiency and semantic expressiveness. The proposed approach achieves state-of-the-art performance across multiple benchmarks, establishing a scalable data and modeling foundation for tactile-driven embodied intelligence.
📝 Abstract
Touch is a key modality for embodied agents to understand the physical world. Although recent work has incorporated tactile signals into language systems for tactile commonsense reasoning, scaling such systems to realistic open-world settings remains challenging due to two key bottlenecks: (1) current tactile reasoning datasets remain limited in format and scale, providing insufficient supervision for reasoning from tactile observations to physical commonsense and hindering the learning of transferable tactile commonsense; (2) Tactile signals are inherently redundant and action-specific, yet existing methods often overlook these properties, resulting in inefficient representations with limited semantic expressiveness. To address these limitations, we propose TouchThinker, a tactile-language framework that scales tactile commonsense reasoning to the open world from both data and representation perspectives. First, we construct TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset covering \textbf{415} objects, \textbf{8} scenarios, and \textbf{7} sensor types, providing a solid data foundation for open-world generalization. We further introduce TouchThinker-Bench, an open-world benchmark with more realistic and diverse tasks. Then, we propose action-aware modeling mechanism to improve tactile representation efficiency and enable efficient reasoning. Experimental results demonstrate that TouchThinker achieves competitive performance against state-of-the-art models across multiple datasets. Our code and dataset will be made available at: https://github.com/lvkailin0118/TouchThinker.