🤖 AI Summary
This paper addresses unsupervised 3D point cloud semantic segmentation—i.e., fine-grained semantic partitioning without any human annotations. To overcome the limitation of existing methods that rely solely on local features and fail to capture global semantic priors, we propose, for the first time, modeling the global distribution of superpoints in the spectral (frequency) domain. Our approach jointly leverages local geometric structure and global spectral patterns to discover interpretable semantic structures, enabling high-quality pseudo-label generation and an end-to-end unsupervised segmentation framework. Experiments demonstrate state-of-the-art (SOTA) performance on three major indoor and outdoor benchmarks—ScanNet, S3DIS, and SemanticKITTI—significantly outperforming all prior unsupervised methods. Moreover, the learned spectral representations exhibit explicit 3D semantic interpretability, offering insights into the underlying geometric semantics of point clouds.
📝 Abstract
We study the problem of unsupervised 3D semantic segmentation on raw point clouds without needing human labels in training. Existing methods usually formulate this problem into learning per-point local features followed by a simple grouping strategy, lacking the ability to discover additional and possibly richer semantic priors beyond local features. In this paper, we introduce LogoSP to learn 3D semantics from both local and global point features. The key to our approach is to discover 3D semantic information by grouping superpoints according to their global patterns in the frequency domain, thus generating highly accurate semantic pseudo-labels for training a segmentation network. Extensive experiments on two indoor and an outdoor datasets show that our LogoSP surpasses all existing unsupervised methods by large margins, achieving the state-of-the-art performance for unsupervised 3D semantic segmentation. Notably, our investigation into the learned global patterns reveals that they truly represent meaningful 3D semantics in the absence of human labels during training.