🤖 AI Summary
This work addresses the challenge of achieving both high accuracy and extended ranging capability in depth estimation on compact devices, where short baselines and limited depth of field typically impose fundamental trade-offs. The authors propose a novel approach that fuses dual micro-defocus (D³) cues with passive stereo vision, leveraging a physically consistent closed-form solution to jointly generate an over-determined depth estimate. A multi-cue consensus mechanism is introduced to select reliable depth hypotheses. Remarkably, the method achieves ranging performance comparable to large-baseline systems using only a 4 mm baseline, delivering centimeter-level accuracy (mean absolute error of 1 cm) over a range of 0.3–1.64 m and producing high-resolution depth maps at 900×1800 pixels—significantly outperforming existing commercial large-format stereo cameras and establishing a new paradigm for miniaturized, high-precision depth sensing.
📝 Abstract
We introduce D^3S Consensus, a physics-based, closed-form algorithm that unifies depth-from-defocus (DfD) and stereo to achieve highly accurate depth estimation throughout an extended working range beyond the depth-of-field (DoF) of cameras. Given a pair of dual-defocus stereo images, the method estimates an overdetermined set of depth using a novel DfD theory, Dual Differential Defocus (D^3), and (S)tereo in a coupled fashion. It then picks the most confident depth prediction from the set by enforcing consensus between these physically independent cues to reject unreliable estimates.
Analysis shows that D^3S achieves a comparable working range under the same error tolerance with 10x smaller baseline than previous triangulation-based depth estimation systems. This enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo and DfD designs. We demonstrate the first D^3S prototype with only 4 mm baseline and 12 mm EFL. It generates up to 900 x 1800-pixel depth maps with 1-cm mean absolute error over 0.3-1.64 m from a snapshot acquisition. This has surpassed the reported accuracy of certain commercially available stereo cameras with much larger form factors.