🤖 AI Summary
To address the insufficient robustness of 6DoF pose tracking in augmented reality assembly guidance—particularly under complex backgrounds, rotationally symmetric objects, and noisy video sequences—this paper proposes a CPU-only real-time tracking method. Our approach introduces three key innovations: (1) a novel sector-based search strategy to optimize contour matching; (2) a hybrid probabilistic contour energy model that jointly encodes local shape features and uncertainty induced by image noise; and (3) a sparse interior-point tracking mechanism integrating DIS optical flow with reweighted least-squares optimization to escape local minima. Evaluated on public benchmarks and real-world assembly scenarios, the method significantly outperforms state-of-the-art monocular trackers in both accuracy and robustness, achieving over 100 FPS on a single CPU core—demonstrating unprecedented efficiency without compromising reliability.
📝 Abstract
Augmented reality assembly guidance is essential for intelligent manufacturing and medical applications, requiring continuous measurement of the 6DoF poses of manipulated objects. Although current tracking methods have made significant advancements in accuracy and efficiency, they still face challenges in robustness when dealing with cluttered backgrounds, rotationally symmetric objects, and noisy sequences. In this paper, we first propose a robust contour-based pose tracking method that addresses error-prone contour correspondences and improves noise tolerance. It utilizes a fan-shaped search strategy to refine correspondences and models local contour shape and noise uncertainty as mixed probability distribution, resulting in a highly robust contour energy function. Secondly, we introduce a CPU-only strategy to better track rotationally symmetric objects and assist the contour-based method in overcoming local minima by exploring sparse interior correspondences. This is achieved by pre-sampling interior points from sparse viewpoint templates offline and using the DIS optical flow algorithm to compute their correspondences during tracking. Finally, we formulate a unified energy function to fuse contour and interior information, which is solvable using a re-weighted least squares algorithm. Experiments on public datasets and real scenarios demonstrate that our method significantly outperforms state-of-the-art monocular tracking methods and can achieve more than 100 FPS using only a CPU.