🤖 AI Summary
Existing zero-shot 3D point cloud semantic segmentation methods suffer from a large vision–semantics gap and poor cross-class generalization. To address these bottlenecks, we propose a geometry-aware zero-shot segmentation framework. Our key contributions are: (1) the first implicit geometric prototype (LGP), which explicitly encodes structural priors of point clouds to enhance semantic-to-visual feature generation and alignment; (2) a self-consistency loss to improve feature robustness; and (3) a unified vision–semantics embedding space enabling unsupervised class generalization. By integrating cross-attention mechanisms with joint projection, our method achieves substantial improvements over four baselines on ScanNet, SemanticKITTI, and S3DIS—yielding average harmonic mIoU gains of 3.2–5.7 percentage points. The code is publicly available.
📝 Abstract
Existing zero-shot 3D point cloud segmentation methods often struggle with limited transferability from seen classes to unseen classes and from semantic to visual space. To alleviate this, we introduce 3D-PointZshotS, a geometry-aware zero-shot segmentation framework that enhances both feature generation and alignment using latent geometric prototypes (LGPs). Specifically, we integrate LGPs into a generator via a cross-attention mechanism, enriching semantic features with fine-grained geometric details. To further enhance stability and generalization, we introduce a self-consistency loss, which enforces feature robustness against point-wise perturbations. Additionally, we re-represent visual and semantic features in a shared space, bridging the semantic-visual gap and facilitating knowledge transfer to unseen classes. Experiments on three real-world datasets, namely ScanNet, SemanticKITTI, and S3DIS, demonstrate that our method achieves superior performance over four baselines in terms of harmonic mIoU. The code is available at href{https://github.com/LexieYang/3D-PointZshotS}{Github}.