EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two key bottlenecks in few-shot/zero-shot 3D point cloud semantic segmentation—over-reliance on pretraining and underutilization of textual supervision—this paper proposes the first pretraining-free, end-to-end language-guided framework. Our method jointly models visual and linguistic modalities for zero-shot generalization without any pretrained vision or language backbones. Key contributions include: (1) a Language-Guided Prototype Embedding (LGPE) module that aligns category-specific textual descriptions with point-wise features; and (2) Prototype-Enhanced Register Attention (ProERA) coupled with Dual Relative Position Encoding (DRPE), which improves cross-class prototype matching accuracy and scene-level generalizability. Evaluated on S3DIS and ScanNet, our approach achieves new state-of-the-art mIoU scores, outperforming prior methods by +5.68% and +3.82%, respectively. This is the first work to empirically validate the feasibility and superiority of pretraining-free paradigms for few-shot and zero-shot point cloud semantic segmentation.

Technology Category

Application Category

📝 Abstract
Recent approaches for few-shot 3D point cloud semantic segmentation typically require a two-stage learning process, i.e., a pre-training stage followed by a few-shot training stage. While effective, these methods face overreliance on pre-training, which hinders model flexibility and adaptability. Some models tried to avoid pre-training yet failed to capture ample information. In addition, current approaches focus on visual information in the support set and neglect or do not fully exploit other useful data, such as textual annotations. This inadequate utilization of support information impairs the performance of the model and restricts its zero-shot ability. To address these limitations, we present a novel pre-training-free network, named Efficient Point Cloud Semantic Segmentation for Few- and Zero-shot scenarios. Our EPSegFZ incorporates three key components. A Prototype-Enhanced Registers Attention (ProERA) module and a Dual Relative Positional Encoding (DRPE)-based cross-attention mechanism for improved feature extraction and accurate query-prototype correspondence construction without pre-training. A Language-Guided Prototype Embedding (LGPE) module that effectively leverages textual information from the support set to improve few-shot performance and enable zero-shot inference. Extensive experiments show that our method outperforms the state-of-the-art method by 5.68% and 3.82% on the S3DIS and ScanNet benchmarks, respectively.
Problem

Research questions and friction points this paper is trying to address.

Addresses overreliance on pre-training in point cloud segmentation
Improves utilization of textual annotations in support sets
Enables zero-shot inference through language-guided prototype embedding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-training-free network for point cloud segmentation
Language-guided prototype embedding for zero-shot inference
Dual positional encoding for enhanced feature extraction
🔎 Similar Papers
No similar papers found.
J
Jiahui Wang
College of Design and Engineering, National University of Singapore
H
Haiyue Zhu
SIMTech, Agency for Science, Technology and Research (A*STAR)
Haoren Guo
Haoren Guo
PhD Candidate, National University of Singapore
deep learningtime seriesPDM
A
Abdullah Al Mamun
College of Design and Engineering, National University of Singapore
Cheng Xiang
Cheng Xiang
National University of Singapore
Control systemscomputer visionmachine learningartificial intelligence
T
Tong Heng Lee
College of Design and Engineering, National University of Singapore