๐ค AI Summary
Functional grasping of complex objects (e.g., tools, household items) remains challenging when target hand poses cannot be achieved in a single step due to geometric or kinematic constraints.
Method: This paper proposes a dexterous pre-grasping manipulation framework for anthropomorphic hands, based on end-to-end, demonstration-free deep reinforcement learning. It employs a unified single-policy multi-category architecture, a novel dense multi-component reward function, and a dual-path grasp representation integrating explicit pose encoding with implicit functional constraints. Training leverages the PPO algorithm and high-fidelity hand dynamics modeling, completed within three hours on a single GPU.
Contribution/Results: The method autonomously performs repositioning and reorientation prior to grasping, generalizes robustly to unseen instances of trained object categories, and achieves high success rates in functional graspingโwithout requiring expert demonstrations or category-specific fine-tuning.
๐ Abstract
Many objects, such as tools and household items, can be used only if grasped in a very specific way - grasped functionally. Often, a direct functional grasp is not possible, though. We propose a method for learning a dexterous pre-grasp manipulation policy to achieve human-like functional grasps using deep reinforcement learning. We introduce a dense multi-component reward function that enables learning a single policy, capable of dexterous pre-grasp manipulation of novel instances of several known object categories with an anthropomorphic hand. The policy is learned purely by means of reinforcement learning from scratch, without any expert demonstrations. It implicitly learns to reposition and reorient objects of complex shapes to achieve given functional grasps. In addition, we explore two different ways to represent a desired grasp: explicit and more abstract, constraint-based. We show that our method consistently learns to successfully manipulate and achieve desired grasps on previously unseen object instances of known categories using both grasp representations. Training is completed on a single GPU in under three hours.