🤖 AI Summary
In dexterous in-hand manipulation, long-horizon tasks suffer from “last-centimeter” pose estimation errors—cumulative inaccuracies arising from initial pose deviations that degrade execution precision.
Method: We propose a tactile-only framework for fine-grained in-hand object pose adjustment, eliminating reliance on vision or predefined target poses. Our approach employs a multi-branch policy network integrating tactile and proprioceptive feedback, pretrained on large-scale MuJoCo simulation data and refined with minimal real-world tactile data to enable closed-loop iterative optimization.
Results: Experiments demonstrate millimeter-level grasping accuracy using tactile input alone in real-world settings. Joint simulation-to-real training significantly enhances robustness, achieving, for the first time, vision-free, purely tactile-driven in-hand pose self-correction under arbitrary target orientations. This work establishes a novel paradigm for visionless dexterous manipulation.
📝 Abstract
Despite progress in both traditional dexterous grasping pipelines and recent Vision-Language-Action (VLA) approaches, the grasp execution stage remains prone to pose inaccuracies, especially in long-horizon tasks, which undermines overall performance. To address this "last-mile" challenge, we propose TacRefineNet, a tactile-only framework that achieves fine in-hand pose refinement of known objects in arbitrary target poses using multi-finger fingertip sensing. Our method iteratively adjusts the end-effector pose based on tactile feedback, aligning the object to the desired configuration. We design a multi-branch policy network that fuses tactile inputs from multiple fingers along with proprioception to predict precise control updates. To train this policy, we combine large-scale simulated data from a physics-based tactile model in MuJoCo with real-world data collected from a physical system. Comparative experiments show that pretraining on simulated data and fine-tuning with a small amount of real data significantly improves performance over simulation-only training. Extensive real-world experiments validate the effectiveness of the method, achieving millimeter-level grasp accuracy using only tactile input. To our knowledge, this is the first method to enable arbitrary in-hand pose refinement via multi-finger tactile sensing alone. Project website is available at https://sites.google.com/view/tacrefinenet