🤖 AI Summary
This work addresses the ill-posed nature of sparse feature matching—such as facial landmarks—in stereo settings, where occlusion, motion, and camera distortion exacerbate ambiguities, particularly across divergent annotation protocols. To tackle this challenge, the authors propose a novel approach that integrates optimal transport with geometric constraints derived from multi-view camera geometry. By modeling image points as 3D rays and constructing a matching cost based on both epipolar and ray distances, the method formulates a partial optimal transport problem that can be solved efficiently. This framework is further extended into a hierarchical, unsupervised keypoint matching pipeline. Notably, it represents the first integration of optimal transport theory with multi-view geometry for cross-annotation landmark alignment, demonstrating robustness and practicality in sparse facial analysis scenarios.
📝 Abstract
Stereo vision between images faces a range of challenges, including occlusions, motion, and camera distortions, across applications in autonomous driving, robotics, and face analysis. Due to parameter sensitivity, further complications arise for stereo matching with sparse features, such as facial landmarks. To overcome this ill-posedness and enable unsupervised sparse matching, we consider line constraints of the camera geometry from an optimal transport (OT) viewpoint. Formulating camera-projected points as (half)lines, we propose the use of the classical epipolar distance as well as a 3D ray distance to quantify matching quality. Employing these distances as a cost function of a (partial) OT problem, we arrive at efficiently solvable assignment problems. Moreover, we extend our approach to unsupervised object matching by formulating it as a hierarchical OT problem. The resulting algorithms allow for efficient feature and object matching, as demonstrated in our numerical experiments. Here, we focus on applications in facial analysis, where we aim to match distinct landmarking conventions.