SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection

📅 2024-06-02

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing visual anomaly detection methods struggle to model higher-order logical anomalies—such as missing or redundant components—due to their reliance on low-level pixel or feature statistics. To address this, we propose the first zero-shot logical anomaly detection framework. Our method leverages the Segment Anything Model (SAM) to generate object masks, then introduces an Object Matching Model (OMM) and a Dynamic Channel-wise Graph Attention (DCGA) mechanism to jointly encode logical relationships and structural inconsistencies. Critically, the framework requires no training—only pre-trained vision backbones and nearest-neighbor retrieval enable cross-domain generalization. Evaluated on MVTec AD, LoCo AD, and DigitAnatomy benchmarks, our approach achieves logical anomaly detection AUC scores surpassing state-of-the-art methods by over 12%. It supports plug-and-play deployment in industrial defect inspection and medical diagnostic applications.

Technology Category

Application Category

📝 Abstract

Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and show poor generalizability due to being heavily data-driven. To fill this gap, we propose SAM-LAD, a zero-shot, plug-and-play framework for logical anomaly detection in any scene. First, we obtain a query image's feature map using a pre-trained backbone. Simultaneously, we retrieve the reference images and their corresponding feature maps via the nearest neighbor search of the query image. Then, we introduce the Segment Anything Model (SAM) to obtain object masks of the query and reference images. Each object mask is multiplied with the entire image's feature map to obtain object feature maps. Next, an Object Matching Model (OMM) is proposed to match objects in the query and reference images. To facilitate object matching, we further propose a Dynamic Channel Graph Attention (DCGA) module, treating each object as a keypoint and converting its feature maps into feature vectors. Finally, based on the object matching relations, an Anomaly Measurement Model (AMM) is proposed to detect objects with logical anomalies. Structural anomalies in the objects can also be detected. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets (MVTec Loco AD, MVTec AD), and the logical dataset (DigitAnatomy). Extensive experimental results demonstrate that SAM-LAD outperforms existing SoTA methods, particularly in detecting logical anomalies.

Problem

Research questions and friction points this paper is trying to address.

Detect logical anomalies in images

Improve generalizability in anomaly detection

Integrate zero-shot learning with SAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Segment Anything Model for object masks

Implements Dynamic Channel Graph Attention

Proposes Object Matching Model for anomaly detection

🔎 Similar Papers

No similar papers found.