Asymmetric Dual Self-Distillation for 3D Self-Supervised Representation Learning

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the challenge of unsupervised semantic representation learning for unlabeled 3D point clouds, this paper proposes a latent-space-based self-supervised learning framework that abandons conventional input-space reconstruction and instead jointly optimizes masked point modeling (MPM) and invariance learning. Key contributions include: (1) a novel asymmetric dual self-distillation architecture to enhance representation robustness; (2) suppression of attention among masked queries to prevent geometric shape leakage; and (3) multi-mask sampling combined with point cloud multi-cropping augmentation to strengthen local–global consistency. Evaluated on ScanObjectNN, the method achieves 90.53% accuracy; after pretraining on 930K shapes, performance improves to 93.72%, surpassing prior state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Learning semantically meaningful representations from unstructured 3D point clouds remains a central challenge in computer vision, especially in the absence of large-scale labeled datasets. While masked point modeling (MPM) is widely used in self-supervised 3D learning, its reconstruction-based objective can limit its ability to capture high-level semantics. We propose AsymDSD, an Asymmetric Dual Self-Distillation framework that unifies masked modeling and invariance learning through prediction in the latent space rather than the input space. AsymDSD builds on a joint embedding architecture and introduces several key design choices: an efficient asymmetric setup, disabling attention between masked queries to prevent shape leakage, multi-mask sampling, and a point cloud adaptation of multi-crop. AsymDSD achieves state-of-the-art results on ScanObjectNN (90.53%) and further improves to 93.72% when pretrained on 930k shapes, surpassing prior methods.

Problem

Research questions and friction points this paper is trying to address.

Learning meaningful 3D representations without labeled data

Improving high-level semantic capture in self-supervised 3D learning

Unifying masked modeling and invariance learning in latent space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric Dual Self-Distillation for 3D learning

Latent space prediction over input space

Multi-mask sampling and multi-crop adaptation

🔎 Similar Papers

No similar papers found.