GS-CLIP: Zero-shot 3D Anomaly Detection by Geometry-Aware Prompt and Synergistic View Representation Learning

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the challenge of zero-shot anomaly detection in 3D point clouds without access to target-domain training data by proposing a two-stage learning framework. First, a geometric defect distillation module generates text prompts enriched with 3D geometric priors. Subsequently, a collaborative multi-view representation learning architecture processes rendered RGB and depth images from multiple viewpoints in parallel, while a co-refinement module fuses features from both streams to enhance anomaly discrimination. The approach innovatively integrates a geometry-aware prompting mechanism with multi-view collaborative representation learning, effectively preserving fine geometric details and leveraging complementary multimodal information. This strategy overcomes the limitations of existing single-view methods in detecting complex anomalies, achieving state-of-the-art performance among zero-shot approaches on four public 3D anomaly detection benchmarks.

Technology Category

Application Category

📝 Abstract

Zero-shot 3D Anomaly Detection is an emerging task that aims to detect anomalies in a target dataset without any target training data, which is particularly important in scenarios constrained by sample scarcity and data privacy concerns. While current methods adapt CLIP by projecting 3D point clouds into 2D representations, they face challenges. The projection inherently loses some geometric details, and the reliance on a single 2D modality provides an incomplete visual understanding, limiting their ability to detect diverse anomaly types. To address these limitations, we propose the Geometry-Aware Prompt and Synergistic View Representation Learning (GS-CLIP) framework, which enables the model to identify geometric anomalies through a two-stage learning process. In stage 1, we dynamically generate text prompts embedded with 3D geometric priors. These prompts contain global shape context and local defect information distilled by our Geometric Defect Distillation Module (GDDM). In stage 2, we introduce Synergistic View Representation Learning architecture that processes rendered and depth images in parallel. A Synergistic Refinement Module (SRM) subsequently fuses the features of both streams, capitalizing on their complementary strengths. Comprehensive experimental results on four large-scale public datasets show that GS-CLIP achieves superior performance in detection. Code can be available at https://github.com/zhushengxinyue/GS-CLIP.

Problem

Research questions and friction points this paper is trying to address.

Zero-shot 3D Anomaly Detection

3D point clouds

geometric details

data privacy

sample scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-Aware Prompt

Synergistic View Representation Learning

Zero-shot 3D Anomaly Detection