π€ AI Summary
Sequential grasping of geometrically heterogeneous objects by high-DOF dexterous hands (e.g., Allegro Hand) faces challenges in modeling complex multi-contact interactions and ensuring grasp stability across successive steps. Method: We propose the first staged grasp synthesis and fusion framework: (1) a single-object subchain-constrained grasp pose generation mechanism; (2) a point-cloud-conditioned diffusion model for synthesizing coordinated multi-object grasp poses; and (3) NVIDIA PhysX-based physical validation coupled with a heuristic two-stage execution strategy. Results: The method achieves 65.8% average success rate over 1,600 simulated trials and 56.7% over 90 real-world trials, demonstrating robustness across diverse configurations (e.g., 8Γ8 and 6Γ3 object arrangements). Contribution: This work introduces the first application of conditional diffusion models to sequential multi-object grasp pose generation and establishes a verifiable, transferable staged grasp planning paradigm.
π Abstract
Sequentially grasping multiple objects with multi-fingered hands is common in daily life, where humans can fully leverage the dexterity of their hands to enclose multiple objects. However, the diversity of object geometries and the complex contact interactions required for high-DOF hands to grasp one object while enclosing another make sequential multi-object grasping challenging for robots. In this paper, we propose SeqMultiGrasp, a system for sequentially grasping objects with a four-fingered Allegro Hand. We focus on sequentially grasping two objects, ensuring that the hand fully encloses one object before lifting it and then grasps the second object without dropping the first. Our system first synthesizes single-object grasp candidates, where each grasp is constrained to use only a subset of the hand's links. These grasps are then validated in a physics simulator to ensure stability and feasibility. Next, we merge the validated single-object grasp poses to construct multi-object grasp configurations. For real-world deployment, we train a diffusion model conditioned on point clouds to propose grasp poses, followed by a heuristic-based execution strategy. We test our system using $8 imes 8$ object combinations in simulation and $6 imes 3$ object combinations in real. Our diffusion-based grasp model obtains an average success rate of 65.8% over 1600 simulation trials and 56.7% over 90 real-world trials, suggesting that it is a promising approach for sequential multi-object grasping with multi-fingered hands. Supplementary material is available on our project website: https://hesic73.github.io/SeqMultiGrasp.