🤖 AI Summary
To address weak discriminability and low localization accuracy in general-purpose industrial anomaly detection, this paper proposes the Self-Navigating Residual Mamba (SR-Mamba) framework. Methodologically, it introduces, for the first time, an intra-image self-reference mechanism that dynamically generates local reference patches to enhance anomaly sensitivity; designs a residual-guided multi-head Mamba module leveraging state-space modeling to capture long-range patch dependencies and enable adaptive focus on anomalous regions; and integrates mutual-residual and self-residual feature extraction with adaptive reference selection and ensemble output. Evaluated on three major benchmarks—MVTec AD, MVTec 3D, and VisA—SR-Mamba achieves new state-of-the-art performance across all key metrics: Image-AUROC, Pixel-AUC, PRO, and AP. Results demonstrate superior fine-grained localization accuracy and strong generalization capability across diverse industrial defect types and modalities.
📝 Abstract
In this paper, we propose Self-Navigated Residual Mamba (SNARM), a novel framework for universal industrial anomaly detection that leverages ``self-referential learning'' within test images to enhance anomaly discrimination. Unlike conventional methods that depend solely on pre-trained features from normal training data, SNARM dynamically refines anomaly detection by iteratively comparing test patches against adaptively selected in-image references. Specifically, we first compute the ``inter-residuals'' features by contrasting test image patches with the training feature bank. Patches exhibiting small-norm residuals (indicating high normality) are then utilized as self-generated reference patches to compute ``intra-residuals'', amplifying discriminative signals. These inter- and intra-residual features are concatenated and fed into a novel Mamba module with multiple heads, which are dynamically navigated by residual properties to focus on anomalous regions. Finally, AD results are obtained by aggregating the outputs of a self-navigated Mamba in an ensemble learning paradigm. Extensive experiments on MVTec AD, MVTec 3D, and VisA benchmarks demonstrate that SNARM achieves state-of-the-art (SOTA) performance, with notable improvements in all metrics, including Image-AUROC, Pixel-AURC, PRO, and AP.