DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing three key challenges in self-supervised 3D point cloud representation learning—geometric irregularity, reconstruction shortcut effects, and semantic long-tailed distribution—this paper proposes the first soft graph distillation framework operating exclusively on observable points. Our method introduces: (1) an observable-point-guided semantic soft graph self-distillation mechanism; (2) Zipfian prior modeling of prototype usage frequency, integrated with a differentiable Zipf-Sinkhorn algorithm to dynamically control soft graph sharpness and prototype assignment; and (3) a differentiable mask-aware semantic distillation strategy. Evaluated on nuScenes, Waymo, SemanticKITTI, ScanNet, and ScanNet200, our approach consistently outperforms state-of-the-art methods under fully unsupervised (zero-label) settings, achieving significant gains in both 3D semantic segmentation and 3D object detection.

Technology Category

Application Category

📝 Abstract
Recent advances in self-supervised learning (SSL) have shown tremendous potential for learning 3D point cloud representations without human annotations. However, SSL for 3D point clouds still faces critical challenges due to irregular geometry, shortcut-prone reconstruction, and unbalanced semantics distribution. In this work, we propose DOS (Distilling Observable Softmaps), a novel SSL framework that self-distills semantic relevance softmaps only at observable (unmasked) points. This strategy prevents information leakage from masked regions and provides richer supervision than discrete token-to-prototype assignments. To address the challenge of unbalanced semantics in an unsupervised setting, we introduce Zipfian prototypes and incorporate them using a modified Sinkhorn-Knopp algorithm, Zipf-Sinkhorn, which enforces a power-law prior over prototype usage and modulates the sharpness of the target softmap during training. DOS outperforms current state-of-the-art methods on semantic segmentation and 3D object detection across multiple benchmarks, including nuScenes, Waymo, SemanticKITTI, ScanNet, and ScanNet200, without relying on extra data or annotations. Our results demonstrate that observable-point softmaps distillation offers a scalable and effective paradigm for learning robust 3D representations.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised learning for 3D point clouds faces irregular geometry and shortcut-prone reconstruction challenges.
It addresses unbalanced semantics distribution in unsupervised settings using Zipfian prototypes.
The method prevents information leakage by distilling semantic relevance only at observable points.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-distills semantic relevance softmaps at observable points only
Uses Zipfian prototypes with a modified Sinkhorn-Knopp algorithm
Prevents information leakage from masked regions during training
🔎 Similar Papers
No similar papers found.