Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the accuracy and generalization bottlenecks in 6-DoF grasp pose estimation from a single RGB image—caused by limited visual cues and object complexity—this paper proposes Tri-Plane Gaussian Mixture (TP-GM), a lightweight, end-to-end, differentiable 3D representation for real-time grasp inference. Our method integrates a tri-plane decoder with a point-cloud-driven grasp distribution generation mechanism to directly regress 6-DoF grasp poses; parallel gripper modeling is further introduced to enhance geometric plausibility. TP-GM enables zero-shot cross-object generalization and achieves millisecond-level inference speed on everyday objects. Experimental results demonstrate significantly higher grasp success rates than current state-of-the-art methods. By unifying compact geometric representation with task-specific differentiable optimization, TP-GM establishes a novel, efficient, and robust paradigm for single-image-driven robotic grasping.

Technology Category

Application Category

📝 Abstract
Reliable object grasping is one of the fundamental tasks in robotics. However, determining grasping pose based on single-image input has long been a challenge due to limited visual information and the complexity of real-world objects. In this paper, we propose Triplane Grasping, a fast grasping decision-making method that relies solely on a single RGB-only image as input. Triplane Grasping creates a hybrid Triplane-Gaussian 3D representation through a point decoder and a triplane decoder, which produce an efficient and high-quality reconstruction of the object to be grasped to meet real-time grasping requirements. We propose to use an end-to-end network to generate 6-DoF parallel-jaw grasp distributions directly from 3D points in the point cloud as potential grasp contacts and anchor the grasp pose in the observed data. Experiments demonstrate that our method achieves rapid modeling and grasping pose decision-making for daily objects, and exhibits a high grasping success rate in zero-shot scenarios.
Problem

Research questions and friction points this paper is trying to address.

Efficient 6-DoF grasping from single RGB images
Real-time 3D reconstruction for grasping decisions
Generalization across diverse object datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Triplane-Gaussian 3D representation for reconstruction
End-to-end network generates 6-DoF grasp distributions
Anchors grasp pose directly in observed point cloud
🔎 Similar Papers
No similar papers found.
Y
Yiming Li
Computer Science Department, Faculty of Science and Engineering, Swansea University, UK
H
Hanchi Ren
Computer Science Department, Faculty of Science and Engineering, Swansea University, UK
Jingjing Deng
Jingjing Deng
Senior Lecturer, University of Bristol
Computer VisionMachine Learning
Xianghua Xie
Xianghua Xie
Computer Science Department, Faculty of Science and Engineering, Swansea University, UK