Symmetry-Aware 9D Pose Estimation with Sim(3)-Consistent Feature and Spherical Inception Convolution

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
🤖 AI Summary
This work addresses the challenges of category-level 9D pose estimation—namely, poor generalization to unseen objects, the complexity of nonlinear modeling in Sim(3) space, and large intra-category shape variations—by introducing a symmetry-aware estimation framework. Without relying on explicit shape priors, the method jointly estimates translation and scale through a semantics-guided symmetric point prediction module. It further incorporates spherical large-kernel Inception convolutions to effectively fuse semantic features from large vision models with geometric constraints, thereby capturing long-range dependencies to enhance rotation estimation accuracy. The approach achieves state-of-the-art performance across multiple benchmark datasets and real-world scenarios, and has been successfully deployed in a robust multi-object robotic grasping system.
📝 Abstract
Object pose estimation is a fundamental problem for an agent system to perceive or manipulate objects in images or videos. However, current instance-level methods struggle with generalization to unseen objects. Category-level methods seek to address this, but remain constrained by the complexities of learning in the non-linear Sim(3) space and intra-class variations. To address these challenges, We propose an effective method for category-level object pose estimation with two key innovations: (1) A translation/size estimator, featuring a semantic-guided symmetry-aware module that leverages robust generalization capabilities of a large vision model (LVM) to infer symmetry points, resulting in accurate translation and size without shape priors. This result serves as a precomputed cue for rotation estimation, thereby reducing the difficulty of learning in the non-linear Sim(3) space and laying a robust foundation for tackling the inherently more challenging rotation estimation. (2) A feature fusion module, based on our proposed spherical large-kernel inception convolution, fuses semantic features from the LVM with systematically computed geometric features to extract essential pose features from intra-class variations by modeling long-range dependencies without excessive computational cost. Built on these innovations, we achieve SOTA on benchmarks and real-world scenes, while developing a robust robotic picking system capable of handling diverse objects. Our code will be available at the project page: {\hypersetup{urlcolor=blue}https://panfei-cheng.github.io/SSH-Pose}.
Problem

Research questions and friction points this paper is trying to address.

object pose estimation
category-level
Sim(3) space
intra-class variations
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sim(3)-Consistent Feature
Spherical Inception Convolution
Symmetry-Aware Pose Estimation
Category-Level 9D Pose
Large Vision Model
🔎 Similar Papers
No similar papers found.
P
Panfei Cheng
National Engineering Research Center for Robot Visual Perception and Control, School of Robotics and Artificial Intelligence, Hunan University, Changsha 410012, China
H
Hongshan Yu
National Engineering Research Center for Robot Visual Perception and Control, School of Robotics and Artificial Intelligence, Hunan University, Changsha 410012, China
Wenrui Chen
Wenrui Chen
Hunan University
RoboticsHandsGraspingDexterous ManipulationHuman-Robot Collaboration
X
Xiaojun Tang
Beijing Spacecrafts, China Academy of Space Technology, Beijing 100094, China
J
Jian Liu
National Engineering Research Center for Robot Visual Perception and Control, School of Robotics and Artificial Intelligence, Hunan University, Changsha 410012, China
Naveed Akhtar
Naveed Akhtar
The University of Melbourne
Computer VisionPattern RecognitionRoboticsRemote Sensing