GraspGen: A Diffusion-based Framework for 6-DOF Grasping with On-Generator Training

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 6-DoF grasp synthesis methods suffer from poor generalization, limiting their plug-and-play applicability across diverse robotic manipulators and real-world environments. To address this, we propose DiffusionTransformer—a novel architecture that for the first time integrates a discriminative module directly into the diffusion generative process, enabling joint optimization of generator and discriminator. Our approach combines object-centric geometric modeling with adversarial training to enhance both geometric plausibility and robustness of predicted grasp poses. Trained on a large-scale, self-collected synthetic dataset of 53 million samples, our method achieves state-of-the-art performance across multiple gripper types in simulation and attains top results on the FetchBench benchmark. Crucially, it demonstrates strong robustness to noisy visual inputs in real-robot experiments, validating its practical deployability.

Technology Category

Application Category

📝 Abstract
Grasping is a fundamental robot skill, yet despite significant research advancements, learning-based 6-DOF grasping approaches are still not turnkey and struggle to generalize across different embodiments and in-the-wild settings. We build upon the recent success on modeling the object-centric grasp generation process as an iterative diffusion process. Our proposed framework, GraspGen, consists of a DiffusionTransformer architecture that enhances grasp generation, paired with an efficient discriminator to score and filter sampled grasps. We introduce a novel and performant on-generator training recipe for the discriminator. To scale GraspGen to both objects and grippers, we release a new simulated dataset consisting of over 53 million grasps. We demonstrate that GraspGen outperforms prior methods in simulations with singulated objects across different grippers, achieves state-of-the-art performance on the FetchBench grasping benchmark, and performs well on a real robot with noisy visual observations.
Problem

Research questions and friction points this paper is trying to address.

Learning-based 6-DOF grasping lacks generalization across embodiments
Enhancing grasp generation with DiffusionTransformer and discriminator
Scaling grasp generation to diverse objects and grippers
Innovation

Methods, ideas, or system contributions that make the work stand out.

DiffusionTransformer enhances grasp generation
On-generator training for efficient discriminator
Simulated dataset scales to diverse grippers
🔎 Similar Papers
No similar papers found.