๐ค AI Summary
Visual anomaly detection suffers from the scarcity of real anomalous samples, while existing synthesis methods often lack realism or rely heavily on annotated data. This paper proposes AnomalyAny, the first framework enabling zero-shot generation of diverse, high-fidelity, cross-category unseen anomalous images at test timeโrequiring only a single normal image and a textual description. Its core innovations are an attention-guided anomaly optimization mechanism and a prompt-driven anomaly refinement strategy, which jointly integrate Stable Diffusion, text-conditional generation, and gradient-based optimization. Evaluated on MVTec AD and VisA, the synthesized anomalies significantly improve downstream detection performance while exhibiting high visual realism and discriminative difficulty. AnomalyAny establishes a novel paradigm for few-shot and zero-shot anomaly detection.
๐ Abstract
Visual anomaly detection (AD) presents significant challenges due to the scarcity of anomalous data samples. While numerous works have been proposed to synthesize anomalous samples, these synthetic anomalies often lack authenticity or require extensive training data, limiting their applicability in real-world scenarios. In this work, we propose Anomaly Anything (AnomalyAny), a novel framework that leverages Stable Diffusion (SD)'s image generation capabilities to generate diverse and realistic unseen anomalies. By conditioning on a single normal sample during test time, AnomalyAny is able to generate unseen anomalies for arbitrary object types with text descriptions. Within AnomalyAny, we propose attention-guided anomaly optimization to direct SD attention on generating hard anomaly concepts. Additionally, we introduce prompt-guided anomaly refinement, incorporating detailed descriptions to further improve the generation quality. Extensive experiments on MVTec AD and VisA datasets demonstrate AnomalyAny's ability in generating high-quality unseen anomalies and its effectiveness in enhancing downstream AD performance.