🤖 AI Summary
This work addresses the significant performance degradation of existing 6-DoF grasping methods in front-facing constrained scenarios, where robotic workspace and kinematic limitations hinder success. To overcome this, the authors propose a grasp optimization framework that integrates geometric priors from the Minimum Volume Bounding Box (MVBB). The approach first employs an O(N) geometric filter based on MVBB face normals to eliminate infeasible grasp poses, then refines candidate rankings via a rescoring function combining discriminator scores with face-normal alignment. A MuJoCo-based evaluation protocol tailored to specific manipulator kinematics is also introduced. Integrated with YOLOv8 object detection, GraspGen grasp proposal, PCA-based MVBB fitting, and inverse kinematics planning, the system achieves a 59.3% success rate on the Unitree Z1 arm—2.4 times higher than the GraspGen baseline (24.7%)—demonstrating substantially improved reliability for front-view grasping.
📝 Abstract
State-of-the-art 6-DoF grasp generators excel on tabletop benchmarks with overhead cameras but struggle in frontal grasping scenarios on low-cost manipulators with constrained workspaces, where kinematic limits and approach-direction constraints cause high failure rates. We address this challenge for the Unitree Z1 arm by proposing MVB-Grasp, a novel grasping stack that injects a Minimum Volume Bounding Box (MVBB) geometric prior into diffusion-based grasp generation to dramatically improve success rates in frontal, workspace-constrained settings. Our key scientific contributions are threefold: (i) an MVBB-based geometric filter that exploits oriented bounding-box face normals to reject grasps approaching through the table or misaligned with accessible object faces in O(N) time; (ii) a combined re-scoring function that blends learned discriminator scores with face-alignment geometry α=0.85, specifically calibrated for the Z1's frontal workspace and kinematic constraints; and (iii) a systematic MuJoCo evaluation protocol measuring grasp success across object types, distances, lateral positions, and pitch orientations to validate embodiment-specific performance. We implement MVB-Grasp on a Unitree Z1 arm with an Intel RealSense D405 camera, integrating YOLOv8 object detection, GraspGen for candidate generation, Principal Component Analysis (PCA)-based MVBB fitting, and inverse-kinematics trajectory planning. Experiments across 81 MuJoCo episodes (cylinder, asymmetric box, waterbottle) demonstrate that MVB-Grasp achieves 59.3% success versus 24.7% for vanilla GraspGen, a 2.4x improvement, by filtering geometrically infeasible candidates and prioritizing face-aligned grasps suited to the Z1's frontal approach constraints. Real-world trials confirm that the MVBB prior substantially improves grasp reliability on constrained, low-cost manipulators without requiring model retraining.