🤖 AI Summary
This work addresses the challenge of inconsistent state representations across heterogeneous dexterous hands—arising from structural and degrees-of-freedom (DoF) disparities—that hinder joint training. The authors propose a Unified Dexterous Hand Model (UDHM), which maps both human and robotic hands onto a shared 22-DoF semantic interface. They further introduce UniDexTok, a retargeting-free tokenizer that directly learns embodied-conditioned discrete tokens from real joint states, enabling a unified cross-embodiment representation. To the best of our knowledge, this is the first method to achieve cross-embodiment dexterous hand state tokenization without simulation or retargeting data, supporting zero- and few-shot high-fidelity reconstruction. Compared to the UniHM baseline, the approach reduces MPJAE and MPJPE to 0.16° and 0.18 mm, respectively—over 98.9% error reduction—achieving sub-millimeter precision and demonstrating significant performance gains from cross-embodiment data.
📝 Abstract
Dexterous hands are essential for fine-grained manipulation, but their hardware designs vary substantially across embodiments. Differences in kinematics, joint definitions, and degrees of freedom make it difficult to define a shared state representation compared with parallel grippers. As a result, dexterous-hand data remains fragmented and difficult to use for joint training. In this work, we propose the Unified Dexterous Hand Model (UDHM), which maps human and robot hand states into a shared 22-DoF semantic interface. Based on UDHM, we introduce UniDexTok, a retargeting-free state tokenizer that learns embodiment-conditioned discrete tokens from standardized real joint states. UniDexTok provides a unified representation for heterogeneous dexterous hands without relying on retargeting or simulation data. Compared with the recent baseline UniHM, UniDexTok reduces MPJAE from 15.63 degrees to 0.16 degrees and MPJPE from 18.51 mm to 0.18 mm, corresponding to error reductions of 98.98% and 99.03%, respectively. These results improve reconstruction from centimeter-scale to sub-millimeter accuracy. Experiments further show that data from other embodiments improves target-embodiment reconstruction accuracy, demonstrating the benefit of cross-embodiment tokenization. UniDexTok also shows strong zero-shot and few-shot reconstruction ability when new dexterous hands are introduced.