Depth from Dual Differential Defocus and Stereo Consensus

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

246K/year
🤖 AI Summary
This work addresses the challenge of achieving both high accuracy and extended ranging capability in depth estimation on compact devices, where short baselines and limited depth of field typically impose fundamental trade-offs. The authors propose a novel approach that fuses dual micro-defocus (D³) cues with passive stereo vision, leveraging a physically consistent closed-form solution to jointly generate an over-determined depth estimate. A multi-cue consensus mechanism is introduced to select reliable depth hypotheses. Remarkably, the method achieves ranging performance comparable to large-baseline systems using only a 4 mm baseline, delivering centimeter-level accuracy (mean absolute error of 1 cm) over a range of 0.3–1.64 m and producing high-resolution depth maps at 900×1800 pixels—significantly outperforming existing commercial large-format stereo cameras and establishing a new paradigm for miniaturized, high-precision depth sensing.
📝 Abstract
We introduce D^3S Consensus, a physics-based, closed-form algorithm that unifies depth-from-defocus (DfD) and stereo to achieve highly accurate depth estimation throughout an extended working range beyond the depth-of-field (DoF) of cameras. Given a pair of dual-defocus stereo images, the method estimates an overdetermined set of depth using a novel DfD theory, Dual Differential Defocus (D^3), and (S)tereo in a coupled fashion. It then picks the most confident depth prediction from the set by enforcing consensus between these physically independent cues to reject unreliable estimates. Analysis shows that D^3S achieves a comparable working range under the same error tolerance with 10x smaller baseline than previous triangulation-based depth estimation systems. This enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo and DfD designs. We demonstrate the first D^3S prototype with only 4 mm baseline and 12 mm EFL. It generates up to 900 x 1800-pixel depth maps with 1-cm mean absolute error over 0.3-1.64 m from a snapshot acquisition. This has surpassed the reported accuracy of certain commercially available stereo cameras with much larger form factors.
Problem

Research questions and friction points this paper is trying to address.

depth estimation
depth-from-defocus
stereo vision
working range
depth-of-field
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Differential Defocus
Stereo Consensus
Depth-from-Defocus
Compact Binocular Rangefinder
Closed-form Depth Estimation