TouchThinker: Scaling Tactile Commonsense Reasoning to the Open World with Large-scale Data and Action-aware Representation

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tactile commonsense reasoning systems struggle to generalize to open-world scenarios due to limited data scale and insufficient modeling of the action-dependence and redundancy inherent in tactile signals. To address this, this work introduces TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset, along with TouchThinker-Bench, a comprehensive open-world evaluation benchmark. Furthermore, we propose an action-aware tactile representation method that leverages a tactile-language fusion framework to explicitly model action context, significantly enhancing both representational efficiency and semantic expressiveness. The proposed approach achieves state-of-the-art performance across multiple benchmarks, establishing a scalable data and modeling foundation for tactile-driven embodied intelligence.
📝 Abstract
Touch is a key modality for embodied agents to understand the physical world. Although recent work has incorporated tactile signals into language systems for tactile commonsense reasoning, scaling such systems to realistic open-world settings remains challenging due to two key bottlenecks: (1) current tactile reasoning datasets remain limited in format and scale, providing insufficient supervision for reasoning from tactile observations to physical commonsense and hindering the learning of transferable tactile commonsense; (2) Tactile signals are inherently redundant and action-specific, yet existing methods often overlook these properties, resulting in inefficient representations with limited semantic expressiveness. To address these limitations, we propose TouchThinker, a tactile-language framework that scales tactile commonsense reasoning to the open world from both data and representation perspectives. First, we construct TouchThinker-1M, a million-scale, multi-source tactile reasoning dataset covering \textbf{415} objects, \textbf{8} scenarios, and \textbf{7} sensor types, providing a solid data foundation for open-world generalization. We further introduce TouchThinker-Bench, an open-world benchmark with more realistic and diverse tasks. Then, we propose action-aware modeling mechanism to improve tactile representation efficiency and enable efficient reasoning. Experimental results demonstrate that TouchThinker achieves competitive performance against state-of-the-art models across multiple datasets. Our code and dataset will be made available at: https://github.com/lvkailin0118/TouchThinker.
Problem

Research questions and friction points this paper is trying to address.

tactile commonsense reasoning
open-world generalization
tactile representation
action-aware modeling
large-scale dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

tactile commonsense reasoning
large-scale dataset
action-aware representation
open-world generalization
embodied AI
🔎 Similar Papers
No similar papers found.
K
Kailin Lyu
Institute of Automation, Chinese Academy of Sciences
D
Di Wu
Institute of Automation, Chinese Academy of Sciences
P
Pengwei Zhang
Institute of Automation, Chinese Academy of Sciences
Yuhang Zheng
Yuhang Zheng
NUS; TARS AI
Robotics3D Vision
Y
Yingxin Lai
Xiamen University
Long Xiao
Long Xiao
University of Cambridge, Engineering Department, Cavendish Laboratory
GraphenePhotonicsTerahertzCommunication System
K
Kangyi Wu
Xi’an Jiaotong University
P
Pengna Li
Xi’an Jiaotong University
C
Chen Gao
National University of Singapore
L
Lianyu Hu
Nanyang Technological University
Xiaobin Hu
Xiaobin Hu
Tencent Youtu Lab;Technische Universität München (TUM)
Deep learningComputer visionVLMAgents
J
Jie Hao
Institute of Automation, Chinese Academy of Sciences
Ce Hao
Ce Hao
National University of Singapore
Weihao Yuan
Weihao Yuan
Hong Kong University of Science and Technology
3D VisionEmbodied AIRobot Reinforcement Learning
S
Shuicheng Yan
National University of Singapore