🤖 AI Summary
Existing anomaly synthesis methods suffer from discontinuous microstructures, coarse-grained semantic control, and low generation efficiency. To address these issues, we propose ARAS—a language-guided, locally autoregressive anomaly synthesis framework. ARAS enables text-driven, fine-grained anomaly injection via token-anchored latent editing and a hard-gated autoregressive operator. It further introduces a training-free masked sampling kernel and a dynamic Quality-Aware Re-weighting and Adaptation mechanism (QARAD) to enhance synthetic realism and detection robustness. The method integrates latent-space editing, dual-encoder vision-language similarity modeling, and efficient sampling strategies. Evaluated on MVTec AD, VisA, and BTAD, ARAS achieves state-of-the-art performance in both image-level and pixel-level anomaly detection—while operating five times faster than prior approaches—and significantly improves texture fidelity and semantic controllability.
📝 Abstract
Despite substantial progress in anomaly synthesis methods, existing diffusion-based and coarse inpainting pipelines commonly suffer from structural deficiencies such as micro-structural discontinuities, limited semantic controllability, and inefficient generation. To overcome these limitations, we introduce ARAS, a language-conditioned, auto-regressive anomaly synthesis approach that precisely injects local, text-specified defects into normal images via token-anchored latent editing. Leveraging a hard-gated auto-regressive operator and a training-free, context-preserving masked sampling kernel, ARAS significantly enhances defect realism, preserves fine-grained material textures, and provides continuous semantic control over synthesized anomalies. Integrated within our Quality-Aware Re-weighted Anomaly Detection (QARAD) framework, we further propose a dynamic weighting strategy that emphasizes high-quality synthetic samples by computing an image-text similarity score with a dual-encoder model. Extensive experiments across three benchmark datasets-MVTec AD, VisA, and BTAD, demonstrate that our QARAD outperforms SOTA methods in both image- and pixel-level anomaly detection tasks, achieving improved accuracy, robustness, and a 5 times synthesis speedup compared to diffusion-based alternatives. Our complete code and synthesized dataset will be publicly available.