🤖 AI Summary
To address the challenge of pixel-level safe navigation in unstructured off-road environments—where anomalous priors are unavailable and ground-truth anomaly annotations are infeasible—this paper proposes a generative diffusion-based anomaly detection method that requires neither out-of-distribution (OOD) samples nor anomaly labels. Our approach introduces an “analysis-synthesis” paradigm: first, differentiable guided diffusion inference—steered by idealized gradient guidance—is employed to synthesize anomaly-free images; second, test-time-only image editing localizes anomalies without retraining; third, semantic-aware editing discrimination is achieved by fusing CLIP and Segment Anything. Technically, we integrate feature-space pixel-wise contrast, gradient approximation, and bootstrap optimization. Experiments demonstrate that our method significantly outperforms existing reconstruction- and probability-based approaches in off-road scenarios, enables real-time deployment, and operates entirely without model fine-tuning or retraining.
📝 Abstract
In order to navigate safely and reliably in off-road and unstructured environments, robots must detect anomalies that are out-of-distribution (OOD) with respect to the training data. We present an analysis-by-synthesis approach for pixel-wise anomaly detection without making any assumptions about the nature of OOD data. Given an input image, we use a generative diffusion model to synthesize an edited image that removes anomalies while keeping the remaining image unchanged. Then, we formulate anomaly detection as analyzing which image segments were modified by the diffusion model. We propose a novel inference approach for guided diffusion by analyzing the ideal guidance gradient and deriving a principled approximation that bootstraps the diffusion model to predict guidance gradients. Our editing technique is purely test-time that can be integrated into existing workflows without the need for retraining or fine-tuning. Finally, we use a combination of vision-language foundation models to compare pixels in a learned feature space and detect semantically meaningful edits, enabling accurate anomaly detection for off-road navigation. Project website: https://siddancha.github.io/anomalies-by-diffusion-synthesis/