🤖 AI Summary
This paper addresses a critical bottleneck in RANSAC-based relative pose estimation: its reliance on manually tuned inlier thresholds and the difficulty of optimization without ground-truth annotations. To overcome this, we propose an adaptive threshold determination method. Our key contributions are: (1) correcting the noise-scale underestimation bias in SIMFIT++ caused by data reuse and neglect of threshold sensitivity; (2) introducing a multi-image-pair collaborative filtering mechanism to enhance cross-scenario consistency and robustness of threshold estimation; and (3) establishing an analytical mapping between noise scale and threshold, enabling end-to-end adaptive threshold derivation. Experiments demonstrate that our method maintains high performance across a wide range of threshold values and significantly improves both robustness and accuracy of pose estimation—especially in the absence of ground-truth labels.
📝 Abstract
The gold-standard for robustly estimating relative pose through image matching is RANSAC. While RANSAC is powerful, it requires setting the inlier threshold that determines whether the error of a correspondence under an estimated model is sufficiently small to be included in its consensus set. Setting this threshold is typically done by hand, and is difficult to tune without a access to ground truth data. Thus, a method capable of automatically determining the optimal threshold would be desirable. In this paper we revisit inlier noise scale estimation, which is an attractive approach as the inlier noise scale is linear to the optimal threshold. We revisit the noise scale estimation method SIMFIT and find bias in the estimate of the noise scale. In particular, we fix underestimates from using the same data for fitting the model as estimating the inlier noise, and from not taking the threshold itself into account. Secondly, since the optimal threshold within a scene is approximately constant we propose a multi-pair extension of SIMFIT++, by filtering of estimates, which improves results. Our approach yields robust performance across a range of thresholds, shown in Figure 1.