🤖 AI Summary
Recovering global camera positions from relative translations specified only up to direction is a highly ill-posed problem, particularly susceptible to outliers. This work proposes TriP, a framework that infers local edge scales through triangle-based geometric reasoning and achieves globally consistent edge lengths and camera poses by synchronizing the scales of overlapping triangles in the logarithmic domain. By leveraging higher-order consistency inherent in triangular structures, TriP enhances robustness, naturally avoids scale collapse without requiring additional constraints, and guarantees theoretical recoverability under structured noise. Experiments demonstrate that TriP significantly outperforms existing methods on both synthetic and real-world datasets, scaling efficiently to graphs with millions of cameras while maintaining high accuracy and strong robustness.
📝 Abstract
Translation averaging aims to recover camera locations from pairwise relative translation directions and is a fundamental component of global Structure-from-Motion pipelines. The problem is challenging because direction measurements contain no distance information, making the estimation problem highly ill-conditioned and highly sensitive to corrupted observations. In this paper, we propose TriP, a triangle-based framework for robust translation averaging. TriP first infers local relative edge scales from triangle geometry, and then synchronizes the scales of overlapping triangles in the logarithmic domain to recover globally consistent edge lengths and camera locations. By leveraging higher-order consistency across triangles, the proposed method is robust to adversarial, cycle-consistent, and other structured corruptions. In addition, TriP avoids the collapse issue without requiring any extra anti-collapse constraints, since log-scale synchronization excludes the degenerate zero-scale solution by construction. These structural advantages enable a particularly strong theory for exact location recovery. On the practical side, TriP is fully parallelizable, computationally efficient, and naturally scalable to graphs with millions of cameras. Moreover, it outperforms all previous translation averaging methods by a large margin on both synthetic and real datasets.