๐ค AI Summary
Addressing monocular depth estimation without all-in-focus (AIF) image guidance, this paper proposes a self-supervised framework that takes a single defocused image as input and jointly predicts a defocus map and the Circle of Confusion (CoC) to regress scene depth. Our key innovation lies in integrating 3D Gaussian Splatting into defocus image rendering, enabling differentiable, self-supervised signal generationโthus eliminating reliance on AIF reference images or ground-truth depth labels. A siamese network architecture facilitates end-to-end joint optimization of defocus and depth estimation. Extensive experiments on both synthetic and real-world defocused datasets demonstrate that our method significantly outperforms conventional defocus-based depth estimation (DFD) approaches, achieving state-of-the-art quantitative accuracy and visual quality in depth prediction.
๐ Abstract
Depth estimation is a fundamental task in 3D geometry. While stereo depth estimation can be achieved through triangulation methods, it is not as straightforward for monocular methods, which require the integration of global and local information. The Depth from Defocus (DFD) method utilizes camera lens models and parameters to recover depth information from blurred images and has been proven to perform well. However, these methods rely on All-In-Focus (AIF) images for depth estimation, which is nearly impossible to obtain in real-world applications. To address this issue, we propose a self-supervised framework based on 3D Gaussian splatting and Siamese networks. By learning the blur levels at different focal distances of the same scene in the focal stack, the framework predicts the defocus map and Circle of Confusion (CoC) from a single defocused image, using the defocus map as input to DepthNet for monocular depth estimation. The 3D Gaussian splatting model renders defocused images using the predicted CoC, and the differences between these and the real defocused images provide additional supervision signals for the Siamese Defocus self-supervised network. This framework has been validated on both artificially synthesized and real blurred datasets. Subsequent quantitative and visualization experiments demonstrate that our proposed framework is highly effective as a DFD method.