Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing anomaly synthesis methods suffer from discontinuous microstructures, coarse-grained semantic control, and low generation efficiency. To address these issues, we propose ARAS—a language-guided, locally autoregressive anomaly synthesis framework. ARAS enables text-driven, fine-grained anomaly injection via token-anchored latent editing and a hard-gated autoregressive operator. It further introduces a training-free masked sampling kernel and a dynamic Quality-Aware Re-weighting and Adaptation mechanism (QARAD) to enhance synthetic realism and detection robustness. The method integrates latent-space editing, dual-encoder vision-language similarity modeling, and efficient sampling strategies. Evaluated on MVTec AD, VisA, and BTAD, ARAS achieves state-of-the-art performance in both image-level and pixel-level anomaly detection—while operating five times faster than prior approaches—and significantly improves texture fidelity and semantic controllability.

Technology Category

Application Category

📝 Abstract
Despite substantial progress in anomaly synthesis methods, existing diffusion-based and coarse inpainting pipelines commonly suffer from structural deficiencies such as micro-structural discontinuities, limited semantic controllability, and inefficient generation. To overcome these limitations, we introduce ARAS, a language-conditioned, auto-regressive anomaly synthesis approach that precisely injects local, text-specified defects into normal images via token-anchored latent editing. Leveraging a hard-gated auto-regressive operator and a training-free, context-preserving masked sampling kernel, ARAS significantly enhances defect realism, preserves fine-grained material textures, and provides continuous semantic control over synthesized anomalies. Integrated within our Quality-Aware Re-weighted Anomaly Detection (QARAD) framework, we further propose a dynamic weighting strategy that emphasizes high-quality synthetic samples by computing an image-text similarity score with a dual-encoder model. Extensive experiments across three benchmark datasets-MVTec AD, VisA, and BTAD, demonstrate that our QARAD outperforms SOTA methods in both image- and pixel-level anomaly detection tasks, achieving improved accuracy, robustness, and a 5 times synthesis speedup compared to diffusion-based alternatives. Our complete code and synthesized dataset will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Overcoming structural deficiencies in anomaly synthesis methods
Enhancing defect realism and semantic control in anomaly generation
Improving accuracy and speed in anomaly detection tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-conditioned auto-regressive anomaly synthesis
Token-anchored latent editing for precise defect injection
Dynamic weighting strategy with image-text similarity scoring
🔎 Similar Papers
No similar papers found.
L
Long Qian
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Bingke Zhu
Bingke Zhu
Institute of Automation,Chinese Academy of Science
Y
Yingying Chen
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China
M
Ming Tang
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China
J
Jinqiao Wang
Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; Objecteye Inc., Beijing, China