RCGNet: RGB-based Category-Level 6D Object Pose Estimation with Geometric Guidance

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of category-level 6D pose estimation methods—namely, their reliance on depth data and poor deployability in real-world RGB-only scenarios—this paper proposes an end-to-end, RGB-only approach. Our method introduces three key innovations: (1) a geometry-aware Transformer network that explicitly encodes 3D structural priors of object categories; (2) a learnable geometric feature guidance mechanism to enhance robustness against scale variation and occlusion; and (3) integration with a lightweight RANSAC-PnP solver for accurate pose regression. Evaluated on standard benchmarks including NOCS and Occlusion LINEMOD, our method significantly outperforms existing RGB-only approaches in both pose accuracy and inference efficiency, achieving a superior trade-off between the two. These results empirically validate the feasibility and practicality of category-level 6D pose estimation without depth input.

Technology Category

Application Category

📝 Abstract
While most current RGB-D-based category-level object pose estimation methods achieve strong performance, they face significant challenges in scenes lacking depth information. In this paper, we propose a novel category-level object pose estimation approach that relies solely on RGB images. This method enables accurate pose estimation in real-world scenarios without the need for depth data. Specifically, we design a transformer-based neural network for category-level object pose estimation, where the transformer is employed to predict and fuse the geometric features of the target object. To ensure that these predicted geometric features faithfully capture the object's geometry, we introduce a geometric feature-guided algorithm, which enhances the network's ability to effectively represent the object's geometric information. Finally, we utilize the RANSAC-PnP algorithm to compute the object's pose, addressing the challenges associated with variable object scales in pose estimation. Experimental results on benchmark datasets demonstrate that our approach is not only highly efficient but also achieves superior accuracy compared to previous RGB-based methods. These promising results offer a new perspective for advancing category-level object pose estimation using RGB images.
Problem

Research questions and friction points this paper is trying to address.

Estimating category-level 6D object pose without depth data
Predicting geometric features from RGB images for pose estimation
Handling scale variations in objects for accurate pose computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer network predicts geometric features
Geometric feature-guided algorithm enhances representation
RANSAC-PnP algorithm computes object pose
🔎 Similar Papers
No similar papers found.
S
Sheng Yu
School of Automation, Beijing Institute of Technology, Beijing 100081, China
Di-Hua Zhai
Di-Hua Zhai
School of Automation, Beijing Institute of Technology
roboticsnetworked robotscomputer visionmedical image processingnonlinear control
Y
Yuanqing Xia
Zhongyuan University of Technology, Zhengzhou 450007, Henan, China, and also with School of Automation, Beijing Institute of Technology, Beijing 100081, China