KGN-Pro: Keypoint-Based Grasp Prediction through Probabilistic 2D-3D Correspondence Learning

šŸ“… 2025-07-20
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Existing 6-DoF grasp estimation methods for high-DOF robotic manipulation suffer from limited robustness to small objects and sensor noise, and rely heavily on costly 3D annotations and discrete pose representations. To address these limitations, this paper proposes an end-to-end grasp pose estimation framework based on probabilistic 2D–3D keypoint correspondence learning. Our method introduces a differentiable probabilistic PnP layer that enables joint optimization of 2D keypoint predictions and 3D pose estimation. An RGB-D encoder network generates keypoint heatmaps and confidence maps, while a weighted reprojection error enforces probabilistic geometric constraints. Crucially, the approach eliminates the need for ground-truth 3D annotations and avoids non-differentiable PnP solvers and discrete sampling bottlenecks. Extensive experiments in both simulation and real-world settings demonstrate significant improvements in grasp coverage and success rate over state-of-the-art methods.

Technology Category

Application Category

šŸ“ Abstract
High-level robotic manipulation tasks demand flexible 6-DoF grasp estimation to serve as a basic function. Previous approaches either directly generate grasps from point-cloud data, suffering from challenges with small objects and sensor noise, or infer 3D information from RGB images, which introduces expensive annotation requirements and discretization issues. Recent methods mitigate some challenges by retaining a 2D representation to estimate grasp keypoints and applying Perspective-n-Point (PnP) algorithms to compute 6-DoF poses. However, these methods are limited by their non-differentiable nature and reliance solely on 2D supervision, which hinders the full exploitation of rich 3D information. In this work, we present KGN-Pro, a novel grasping network that preserves the efficiency and fine-grained object grasping of previous KGNs while integrating direct 3D optimization through probabilistic PnP layers. KGN-Pro encodes paired RGB-D images to generate Keypoint Map, and further outputs a 2D confidence map to weight keypoint contributions during re-projection error minimization. By modeling the weighted sum of squared re-projection errors probabilistically, the network effectively transmits 3D supervision to its 2D keypoint predictions, enabling end-to-end learning. Experiments on both simulated and real-world platforms demonstrate that KGN-Pro outperforms existing methods in terms of grasp cover rate and success rate.
Problem

Research questions and friction points this paper is trying to address.

Improves 6-DoF grasp estimation for robotic manipulation
Addresses challenges with small objects and sensor noise
Enables end-to-end learning with 3D supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic PnP layers for 3D optimization
RGB-D encoding for Keypoint Map generation
2D confidence map for error minimization
šŸ”Ž Similar Papers
No similar papers found.
B
Bingran Chen
Zhejiang University
Baorun Li
Baorun Li
Zhejiang university
roboticsmanipulationslam
J
Jian Yang
China Research and Development Academy of Machinery Equipment
Y
Yong Liu
Zhejiang University, State Key Laboratory of Industrial Control Technology
Guangyao Zhai
Guangyao Zhai
Technical University of Munich; ETH Zurich
Generative AIEmbodied AI