RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

📅 2024-08-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the sensitivity to object rotation and insufficient feature robustness in point cloud self-supervised representation learning, this paper proposes RI-MAE—the first end-to-end rotation-invariant masked autoencoder framework. Methodologically, we design the RI-Transformer architecture, incorporating rotation-invariant relative directional and positional embeddings, and adopt a dual-branch teacher–student structure to decouple geometric content encoding from rotation-invariant pose encoding. The model is jointly optimized via masked point cloud reconstruction and self-supervised contrastive distillation. Extensive experiments on standard benchmarks—including ModelNet40—demonstrate significant improvements in rotation robustness. RI-MAE achieves state-of-the-art performance on downstream classification and segmentation tasks. To foster reproducibility and further research, the source code is publicly released.

Technology Category

Application Category

📝 Abstract
Masked point modeling methods have recently achieved great success in self-supervised learning for point cloud data. However, these methods are sensitive to rotations and often exhibit sharp performance drops when encountering rotational variations. In this paper, we propose a novel Rotation-Invariant Masked AutoEncoders (RI-MAE) to address two major challenges: 1) achieving rotation-invariant latent representations, and 2) facilitating self-supervised reconstruction in a rotation-invariant manner. For the first challenge, we introduce RI-Transformer, which features disentangled geometry content, rotation-invariant relative orientation and position embedding mechanisms for constructing rotation-invariant point cloud latent space. For the second challenge, a novel dual-branch student-teacher architecture is devised. It enables the self-supervised learning via the reconstruction of masked patches within the learned rotation-invariant latent space. Each branch is based on an RI-Transformer, and they are connected with an additional RI-Transformer predictor. The teacher encodes all point patches, while the student solely encodes unmasked ones. Finally, the predictor predicts the latent features of the masked patches using the output latent embeddings from the student, supervised by the outputs from the teacher. Extensive experiments demonstrate that our method is robust to rotations, achieving the state-of-the-art performance on various downstream tasks. Our code is available at https://github.com/kunmingsu07/RI-MAE.
Problem

Research questions and friction points this paper is trying to address.

3D Point Clouds
Rotation Invariance
Self-supervised Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotation Invariance
Masked Autoencoder
Self-supervised Learning
🔎 Similar Papers
No similar papers found.
K
Kunming Su
South China University of Technology
Qiuxia Wu
Qiuxia Wu
华南理工大学
P
Panpan Cai
South China University of Technology
X
Xiaogang Zhu
The University of Adelaide
Xuequan Lu
Xuequan Lu
Associate Professor (North American System)
Visual computing3D geometry/visionVR/ARGraphicsDeep learning
Z
Zhiyong Wang
The University of Sydney
K
Kun Hu
The University of Sydney