🤖 AI Summary
Addressing the longstanding challenge of balancing efficiency and accuracy in 6D pose estimation for industrial real-time applications—such as automated visual inspection and robotic manipulation—this paper proposes a scalable, low-latency pose estimation framework. Our core contribution is AMIS, an Adaptive Model Selection algorithm that dynamically optimizes the trade-off between inference speed and accuracy. Built upon the GDRNPP architecture, the framework integrates lightweight design principles, model scaling strategies, and task-aware inference scheduling. Evaluated on four major benchmarks—LM-O, YCB-V, T-LESS, and ITODD—the method achieves state-of-the-art accuracy while accelerating inference by 3–5× and reducing average latency to the millisecond level. Furthermore, it demonstrates significantly improved cross-dataset generalization and deployment robustness. The proposed solution thus provides an efficient, accurate, and practically deployable 6D pose estimation system tailored for demanding real-time industrial scenarios.
📝 Abstract
In industrial applications requiring real-time feedback, such as quality control and robotic manipulation, the demand for high-speed and accurate pose estimation remains critical. Despite advances improving speed and accuracy in pose estimation, finding a balance between computational efficiency and accuracy poses significant challenges in dynamic environments. Most current algorithms lack scalability in estimation time, especially for diverse datasets, and the state-of-the-art (SOTA) methods are often too slow. This study focuses on developing a fast and scalable set of pose estimators based on GDRNPP to meet or exceed current benchmarks in accuracy and robustness, particularly addressing the efficiency-accuracy trade-off essential in real-time scenarios. We propose the AMIS algorithm to tailor the utilized model according to an application-specific trade-off between inference time and accuracy. We further show the effectiveness of the AMIS-based model choice on four prominent benchmark datasets (LM-O, YCB-V, T-LESS, and ITODD).