Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the accuracy and generalization bottlenecks in 6-DoF grasp pose estimation from a single RGB image—caused by limited visual cues and object complexity—this paper proposes Tri-Plane Gaussian Mixture (TP-GM), a lightweight, end-to-end, differentiable 3D representation for real-time grasp inference. Our method integrates a tri-plane decoder with a point-cloud-driven grasp distribution generation mechanism to directly regress 6-DoF grasp poses; parallel gripper modeling is further introduced to enhance geometric plausibility. TP-GM enables zero-shot cross-object generalization and achieves millisecond-level inference speed on everyday objects. Experimental results demonstrate significantly higher grasp success rates than current state-of-the-art methods. By unifying compact geometric representation with task-specific differentiable optimization, TP-GM establishes a novel, efficient, and robust paradigm for single-image-driven robotic grasping.

Technology Category

Application Category

📝 Abstract

Reliable object grasping is one of the fundamental tasks in robotics. However, determining grasping pose based on single-image input has long been a challenge due to limited visual information and the complexity of real-world objects. In this paper, we propose Triplane Grasping, a fast grasping decision-making method that relies solely on a single RGB-only image as input. Triplane Grasping creates a hybrid Triplane-Gaussian 3D representation through a point decoder and a triplane decoder, which produce an efficient and high-quality reconstruction of the object to be grasped to meet real-time grasping requirements. We propose to use an end-to-end network to generate 6-DoF parallel-jaw grasp distributions directly from 3D points in the point cloud as potential grasp contacts and anchor the grasp pose in the observed data. Experiments demonstrate that our method achieves rapid modeling and grasping pose decision-making for daily objects, and exhibits a high grasping success rate in zero-shot scenarios.

Problem

Research questions and friction points this paper is trying to address.

Efficient 6-DoF grasping from single RGB images

Real-time 3D reconstruction for grasping decisions

Generalization across diverse object datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Triplane-Gaussian 3D representation for reconstruction

End-to-end network generates 6-DoF grasp distributions

Anchors grasp pose directly in observed point cloud

🔎 Similar Papers

GoalGrasp: Grasping Goals in Partially Occluded Scenarios without Grasp Training