🤖 AI Summary
This work addresses the bottleneck in general object pose estimation—its reliance on hard-to-acquire CAD models—by investigating the feasibility of substituting them with image-reconstructed 3D models. To this end, we introduce the first pose-estimation-oriented 3D reconstruction quality benchmark, built upon the YCB-V dataset and featuring calibrated reconstructions aligned with ground-truth poses. The benchmark integrates geometric reconstruction pipelines (COLMAP for SfM/MVS), learning-based methods (PixelNeRF, iMAP), and the BOP evaluation framework. Key contributions are: (1) the first reconstruction quality benchmark explicitly designed for pose estimation; (2) empirical evidence that traditional geometric methods outperform learning-based approaches in the accuracy–speed trade-off; (3) discovery that standard reconstruction metrics (e.g., Chamfer distance) exhibit weak correlation with pose estimation accuracy; and (4) demonstration that most image-reconstructed models support high-accuracy pose estimation, albeit systematically underperforming CAD models. Code and benchmark are publicly released.
📝 Abstract
Current generalizable object pose estimators, i.e., approaches that do not need to be trained per object, rely on accurate 3D models. Predominantly, CAD models are used, which can be hard to obtain in practice. At the same time, it is often possible to acquire images of an object. Naturally, this leads to the question of whether 3D models reconstructed from images are sufficient to facilitate accurate object pose estimation. We aim to answer this question by proposing a novel benchmark for measuring the impact of 3D reconstruction quality on pose estimation accuracy. Our benchmark provides calibrated images suitable for reconstruction and registered with the test images of the YCB-V dataset for pose evaluation under the BOP benchmark format. Detailed experiments with multiple state-of-the-art 3D reconstruction and object pose estimation approaches show that the geometry produced by modern reconstruction methods is often sufficient for accurate pose estimation. Our experiments lead to interesting observations: (1) Standard metrics for measuring 3D reconstruction quality are not necessarily indicative of pose estimation accuracy, which shows the need for dedicated benchmarks such as ours. (2) Classical, non-learning-based approaches can perform on par with modern learning-based reconstruction techniques and can even offer a better reconstruction time-pose accuracy tradeoff. (3) There is still a sizable gap between performance with reconstructed and with CAD models. To foster research on closing this gap, the benchmark is made available at https://github.com/VarunBurde/reconstruction_pose_benchmark.