SCAPO: Self-Supervised Category-Level Articulated Pose Estimation from a Single 3D Observation

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

162K/year
🤖 AI Summary
This work addresses the problem of category-level pose estimation for deformable objects from a single RGB-D image without requiring dense supervision, CAD templates, or multi-view inputs. The authors propose a self-supervised method that employs an SE(3)-equivariant vector neuron autoencoder to align observations into a canonical space, coupled with a joint-aware linear blend skinning module to jointly recover shared category-level geometry, rigid part segmentation, and explicit joint parameters—including rotation axes, pivot points, and articulation poses. This approach is the first to achieve explicit joint modeling and geometry-motion disentanglement for category-level deformable objects under fully unsupervised conditions. It demonstrates state-of-the-art performance on both synthetic and real-world datasets, significantly outperforming existing self-supervised methods.
📝 Abstract
Existing methods for category-level object articulation from a single 3D observation often rely on dense supervision, multi-frame inputs, or CAD templates, and still struggle to disentangle geometry from articulation or to recover explicit joint parameters. We propose SCAPO, a self-supervised framework that estimates canonical geometry, rigid part segmentation, and joint pivots, axes, and articulation states from a single RGB-D observation without ground-truth labels or category-specific models. Our SCAPO first uses an SE(3)-equivariant vector-neuron autoencoder to factor out global pose and align diverse instances into a shared canonical space. On this aligned shape, a joint-aware blend-skinning module is then designed to model part motion. We learn this representation through cycle reconstruction between observed and canonical shapes and cross-space alignment with a learnable canonical template that decouples shared category geometry from instance-specific residual shape. Experiments on synthetic and real articulated-object datasets show that our SCAPO recovers consistent part structure and accurate articulation parameters and outperforms all self-supervised baselines.
Problem

Research questions and friction points this paper is trying to address.

articulated pose estimation
category-level
single 3D observation
geometry-articulation disentanglement
joint parameter recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised
articulated pose estimation
SE(3)-equivariant
canonical shape
joint parameter recovery
🔎 Similar Papers