FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing key challenges in zero- and few-shot anomaly detection—including the absence of normal samples from the target class, difficulty in fine-grained localization, and imprecise anomaly descriptions—this paper proposes FusDes-DefLoc, a novel framework comprising two core components. First, FusDes introduces fusion-based fine-grained description generation, enabling class-adaptive semantic characterization of anomalies without relying on target-class normal data. Second, DefLoc employs deformable cross-modal localization via multi-scale deformable cross-modal interaction (MDCI) and position-enhanced patch matching, achieving accurate pixel-level localization even for irregular anomaly regions. By synergistically integrating large language models with Grounding DINO, FusDes-DefLoc performs end-to-end detection and precise localization without requiring any normal samples from the target class. Extensive experiments demonstrate substantial improvements over state-of-the-art methods across multiple benchmarks, significantly enhancing both detection accuracy and localization robustness in zero- and few-shot settings.

Technology Category

Application Category

📝 Abstract
Anomaly detection methods typically require extensive normal samples from the target class for training, limiting their applicability in scenarios that require rapid adaptation, such as cold start. Zero-shot and few-shot anomaly detection do not require labeled samples from the target class in advance, making them a promising research direction. Existing zero-shot and few-shot approaches often leverage powerful multimodal models to detect and localize anomalies by comparing image-text similarity. However, their handcrafted generic descriptions fail to capture the diverse range of anomalies that may emerge in different objects, and simple patch-level image-text matching often struggles to localize anomalous regions of varying shapes and sizes. To address these issues, this paper proposes the FiLo++ method, which consists of two key components. The first component, Fused Fine-Grained Descriptions (FusDes), utilizes large language models to generate anomaly descriptions for each object category, combines both fixed and learnable prompt templates and applies a runtime prompt filtering method, producing more accurate and task-specific textual descriptions. The second component, Deformable Localization (DefLoc), integrates the vision foundation model Grounding DINO with position-enhanced text descriptions and a Multi-scale Deformable Cross-modal Interaction (MDCI) module, enabling accurate localization of anomalies with various shapes and sizes. In addition, we design a position-enhanced patch matching approach to improve few-shot anomaly detection performance. Experiments on multiple datasets demonstrate that FiLo++ achieves significant performance improvements compared with existing methods. Code will be available at https://github.com/CASIA-IVA-Lab/FiLo.
Problem

Research questions and friction points this paper is trying to address.

Anomaly Detection
Zero-shot Learning
Few-shot Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

FiLo++
Grounding DINO
Zero-shot/Few-shot Anomaly Detection
🔎 Similar Papers
No similar papers found.
Z
Zhaopeng Gu
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Bingke Zhu
Bingke Zhu
Institute of Automation,Chinese Academy of Science
Guibo Zhu
Guibo Zhu
Institute of Automation, Chinese Academy of Sciecnes
Artificial IntelligenceComputer VisionMachine Learning
Y
Yingying Chen
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; Objecteye Inc., Beijing 100190, China
M
Ming Tang
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
J
Jinqiao Wang
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; Wuhan AI Research, Wuhan 430073, China; Peng Cheng Laboratory, Shenzhen 518066, China