🤖 AI Summary
To address challenges in high-dimensional dexterous hand control—including low data efficiency, poor task generalization, and difficult sim-to-real transfer—this paper draws inspiration from the human internal model mechanism and introduces, for the first time, a neural internal model framework for dexterous manipulation. Our approach proposes a learnable hand neuromechanical dynamics model integrated with a bidirectional optimal control planning architecture. It synergistically incorporates external vision models and large language models (LLMs) to enable both dexterous manipulation planning and multi-pose gesture generation driven by natural-language instructions. Evaluated across multiple dexterous hands, the framework achieves substantial improvements: over 50% reduction in training sample requirements; efficient cross-task and cross-hardware transfer; robust and generalizable dexterous manipulation in both simulation and real-world settings; and LLM-guided, semantically grounded hand pose synthesis.
📝 Abstract
Controlling hands in high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the concept of internal model exhibited in human behavior and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that includes a couple of neural networks (NNs) capturing the dynamical characteristics of hands and a bidirectional planning approach, which demonstrates both training and planning efficiency. To show the versatility of MoDex, we further integrate it with an external model to manipulate in-hand objects and a large language model (LLM) to generate various gestures in both simulation and real world. Extensive experiments on different dexterous hands address the data efficiency in learning a new task and the transferability between different tasks.