🤖 AI Summary
Existing Shape-from-Focus (SFF) methods typically adopt a two-stage paradigm: first extracting a focus volume via complex encoders, then estimating depth via simple aggregation—leading to artifacts and noise amplification. This work proposes an end-to-end lightweight framework that synergistically integrates handcrafted priors with recurrent modeling capabilities. Specifically, we introduce: (i) a novel multi-scale directional-dilated Laplacian (DDL) operator to construct a robust focus volume representation; and (ii) a GRU-driven iterative low-resolution depth refinement module coupled with a learnable convex upsampling mechanism. Evaluated on both synthetic and real-world datasets, our method achieves significant improvements over state-of-the-art approaches—yielding higher depth accuracy, superior boundary preservation, strong generalization across diverse scenes, and efficient inference.
📝 Abstract
Shape-from-Focus (SFF) is a passive depth estimation technique that infers scene depth by analyzing focus variations in a focal stack. Most recent deep learning-based SFF methods typically operate in two stages: first, they extract focus volumes (a per pixel representation of focus likelihood across the focal stack) using heavy feature encoders; then, they estimate depth via a simple one-step aggregation technique that often introduces artifacts and amplifies noise in the depth map. To address these issues, we propose a hybrid framework. Our method computes multi-scale focus volumes traditionally using handcrafted Directional Dilated Laplacian (DDL) kernels, which capture long-range and directional focus variations to form robust focus volumes. These focus volumes are then fed into a lightweight, multi-scale GRU-based depth extraction module that iteratively refines an initial depth estimate at a lower resolution for computational efficiency. Finally, a learned convex upsampling module within our recurrent network reconstructs high-resolution depth maps while preserving fine scene details and sharp boundaries. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach outperforms state-of-the-art deep learning and traditional methods, achieving superior accuracy and generalization across diverse focal conditions.