🤖 AI Summary
This work addresses inaccurate modeling of depth discontinuities and occluded regions in stereo vision. We propose an interpretable 3D surface modeling method that integrates human-inspired monocular geometric priors with deep learning. Specifically, we embed an analytical monocular surface model—derived from cyclopean-eye geometry—into a stereo matching framework, leveraging monocular priors to guide the completion of occluded and textureless regions, thereby enabling joint optimization of geometric constraints and data-driven learning. Compared to state-of-the-art purely data-driven approaches, our method maintains competitive depth estimation accuracy while significantly improving the visual quality and structural consistency of output depth maps. It demonstrates enhanced robustness and generalization capability in downstream applications such as VR rendering and robot navigation.
📝 Abstract
We innovate in stereo vision by explicitly providing analytical 3D surface models as viewed by a cyclopean eye model that incorporate depth discontinuities and occlusions. This geometrical foundation combined with learned stereo features allows our system to benefit from the strengths of both approaches. We also invoke a prior monocular model of surfaces to fill in occlusion regions or texture-less regions where data matching is not sufficient. Our results already are on par with the state-of-the-art purely data-driven methods and are of much better visual quality, emphasizing the importance of the 3D geometrical model to capture critical visual information. Such qualitative improvements may find applicability in virtual reality, for a better human experience, as well as in robotics, for reducing critical errors. Our approach aims to demonstrate that understanding and modeling geometrical properties of 3D surfaces is beneficial to computer vision research.