🤖 AI Summary
To address the zero-shot generalization bottleneck of the Segment Anything Model (SAM) in shadow removal—specifically its inability to distinguish shadows from background without fine-tuning—we propose the first SAM-based zero-shot shadow removal framework. Methodologically, we innovatively incorporate SAM’s segmentation priors into a diffusion model to formulate a conditional denoising process; introduce a Multi-Self-Attention Guidance (MSAG) mechanism to enhance structural awareness; and propose DDPM-AIP, an adaptive input perturbation strategy that accelerates convergence and improves detail fidelity. Evaluated on standard benchmarks, our approach achieves state-of-the-art performance (notably higher PSNR and SSIM), enables plug-and-play zero-shot generalization across diverse scenes without task-specific adaptation, and effectively preserves edge sharpness, fine-grained texture, and semantic structural consistency.
📝 Abstract
Segment Anything (SAM), an advanced universal image segmentation model trained on an expansive visual dataset, has set a new benchmark in image segmentation and computer vision. However, it faced challenges when it came to distinguishing between shadows and their backgrounds. To address this, we developed Deshadow-Anything, considering the generalization of large-scale datasets, and we performed Fine-tuning on large-scale datasets to achieve image shadow removal. The diffusion model can diffuse along the edges and textures of an image, helping to remove shadows while preserving the details of the image. Furthermore, we design Multi-Self-Attention Guidance (MSAG) and adaptive input perturbation (DDPM-AIP) to accelerate the iterative training speed of diffusion. Experiments on shadow removal tasks demonstrate that these methods can effectively improve image restoration performance.