🤖 AI Summary
Single-image reflection removal suffers from limited generalization due to strong coupling between transmission and reflection layers, scarcity of high-quality annotated data, and insufficient prior modeling. To address these challenges, this work proposes: (1) a novel physics-driven reflection synthesis method enabling controllable variation in incident angle and intensity, yielding the high-fidelity, diverse DRR dataset; (2) a deterministic one-step diffusion framework integrating rotationally invariant physical simulation of birefringent media with reflection-invariant constraints; and (3) a three-stage progressive fine-tuning strategy for efficient adaptation. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks and real-world uncurated images, significantly improving cross-scene robustness and generalization. The method enables end-to-end reflection removal on arbitrary real-world images without requiring scene-specific tuning.
📝 Abstract
Reflection removal of a single image remains a highly challenging task due to the complex entanglement between target scenes and unwanted reflections. Despite significant progress, existing methods are hindered by the scarcity of high-quality, diverse data and insufficient restoration priors, resulting in limited generalization across various real-world scenarios. In this paper, we propose Dereflection Any Image, a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal. First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes, enabling variation of reflection angles and intensities, and setting a new benchmark in scale, quality, and diversity. Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference. To ensure stable learning, we design a three-stage progressive training strategy, including reflection-invariant finetuning to encourage consistent outputs across varying reflection patterns that characterize our dataset. Extensive experiments show that our method achieves SOTA performance on both common benchmarks and challenging in-the-wild images, showing superior generalization across diverse real-world scenes.