🤖 AI Summary
Existing unsupervised anomaly detection (UAD) methods face two key challenges: (1) multi-class models substantially underperform single-class state-of-the-art (SOTA) approaches, and (2) domain fragmentation—e.g., specialized solutions for 3D, few-shot, or multimodal settings—hinders unified deployment. To address these, we propose Dinomaly2, the first unified UAD framework for the full spectrum of image modalities (2D, 3D, multi-view, infrared). Centered on extreme architectural simplicity, it integrates only five lightweight modules—feature extraction, memory bank, residual modeling, multi-scale fusion, and contrastive learning—to enable zero-shot, cross-task adaptation without task-specific tuning. Built upon a reconstruction paradigm with standardized network architecture, Dinomaly2 achieves new SOTA performance across 12 benchmarks: 99.9% and 99.3% image-level AUROC on MVTec-AD and VisA (multi-class), respectively; remarkably, it retains 98.7% and 97.4% AUROC using merely eight normal samples—surpassing prior full-sample methods. These results underscore the critical role of minimalism in achieving broad generalizability.
📝 Abstract
Unsupervised anomaly detection (UAD) has evolved from building specialized single-class models to unified multi-class models, yet existing multi-class models significantly underperform the most advanced one-for-one counterparts. Moreover, the field has fragmented into specialized methods tailored to specific scenarios (multi-class, 3D, few-shot, etc.), creating deployment barriers and highlighting the need for a unified solution. In this paper, we present Dinomaly2, the first unified framework for full-spectrum image UAD, which bridges the performance gap in multi-class models while seamlessly extending across diverse data modalities and task settings. Guided by the "less is more" philosophy, we demonstrate that the orchestration of five simple element achieves superior performance in a standard reconstruction-based framework. This methodological minimalism enables natural extension across diverse tasks without modification, establishing that simplicity is the foundation of true universality. Extensive experiments on 12 UAD benchmarks demonstrate Dinomaly2's full-spectrum superiority across multiple modalities (2D, multi-view, RGB-3D, RGB-IR), task settings (single-class, multi-class, inference-unified multi-class, few-shot) and application domains (industrial, biological, outdoor). For example, our multi-class model achieves unprecedented 99.9% and 99.3% image-level (I-) AUROC on MVTec-AD and VisA respectively. For multi-view and multi-modal inspection, Dinomaly2 demonstrates state-of-the-art performance with minimum adaptations. Moreover, using only 8 normal examples per class, our method surpasses previous full-shot models, achieving 98.7% and 97.4% I-AUROC on MVTec-AD and VisA. The combination of minimalistic design, computational scalability, and universal applicability positions Dinomaly2 as a unified solution for the full spectrum of real-world anomaly detection applications.