Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In zero-shot anomaly detection, handcrafted prompts rely heavily on expert knowledge, single prompts struggle to capture complex anomaly semantics, and unconstrained prompt spaces hinder cross-category generalization. To address these challenges, we propose the first Bayesian-inspired prompt flow learning framework: it jointly models the distributions of image-specific and generic prompts, enabling diverse prompt generation via learnable sampling; introduces residual cross-attention (RCA) to align fine-grained vision-language features; and leverages CLIP for vision-language collaborative reasoning. Our method operates in a fully unsupervised setting—requiring no target-domain annotations—and achieves state-of-the-art performance across 15 industrial and medical benchmark datasets. It significantly improves zero-shot detection accuracy and cross-category generalization capability compared to existing approaches.

Technology Category

Application Category

📝 Abstract
Recently, vision-language models (e.g. CLIP) have demonstrated remarkable performance in zero-shot anomaly detection (ZSAD). By leveraging auxiliary data during training, these models can directly perform cross-category anomaly detection on target datasets, such as detecting defects on industrial product surfaces or identifying tumors in organ tissues. Existing approaches typically construct text prompts through either manual design or the optimization of learnable prompt vectors. However, these methods face several challenges: 1) handcrafted prompts require extensive expert knowledge and trial-and-error; 2) single-form learnable prompts struggle to capture complex anomaly semantics; and 3) an unconstrained prompt space limit generalization to unseen categories. To address these issues, we propose Bayesian Prompt Flow Learning (Bayes-PFL), which models the prompt space as a learnable probability distribution from a Bayesian perspective. Specifically, a prompt flow module is designed to learn both image-specific and image-agnostic distributions, which are jointly utilized to regularize the text prompt space and enhance the model's generalization on unseen categories. These learned distributions are then sampled to generate diverse text prompts, effectively covering the prompt space. Additionally, a residual cross-attention (RCA) module is introduced to better align dynamic text embeddings with fine-grained image features. Extensive experiments on 15 industrial and medical datasets demonstrate our method's superior performance.
Problem

Research questions and friction points this paper is trying to address.

Improves zero-shot anomaly detection using Bayesian Prompt Flow Learning.
Addresses challenges in manual and single-form prompt design.
Enhances generalization to unseen categories with dynamic text embeddings.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Prompt Flow Learning for ZSAD
Learnable probability distribution for prompts
Residual cross-attention for feature alignment
🔎 Similar Papers
No similar papers found.
Zhen Qu
Zhen Qu
Institude of Automation, Chinese Academy of Sciences
X
Xian Tao
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Xinyi Gong
Xinyi Gong
CGG
Spherical IndentationAdditive ManufacturingHigh Throughput ExperimentationMaterials CharacterizationMaterials Informatic
S
Shichen Qu
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Qiyu Chen
Qiyu Chen
Institute of Automation, Chinese Academy of Sciences
Anomaly DetectionComputer VisionDeep Learning
Z
Zhengtao Zhang
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
X
Xingang Wang
Institute of Automation, Chinese Academy of Sciences; Luoyang Institute for Robot and Intelligent Equipment
Guiguang Ding
Guiguang Ding
Tsinghua University
Computer VisionMultimedia Retrieval