🤖 AI Summary
This work addresses the challenge of 6D pose estimation for strawberry harvesting robots, where high accuracy is essential yet annotated real-world data from agricultural fields remains scarce. Existing approaches often rely on synthetic data lacking photorealism, leaving their real-world performance uncertain. To bridge this gap, the authors present the first real-world dataset for 6D strawberry pose estimation, comprising 12,040 images captured in actual farmland, alongside a high-fidelity, domain-randomized synthetic dataset generated using NVIDIA Isaac Sim. Through systematic evaluation of state-of-the-art 6D pose estimation methods, the study reveals a significant performance drop when models trained solely on synthetic data are deployed in real environments. These findings underscore the critical importance of real-world data for reliable perception in agricultural robotics and establish the first publicly available benchmark for the community.
📝 Abstract
Robotic strawberry harvesting requires precise 6D pose estimation; however, collecting 6D pose ground truth in real agricultural fields is inherently challenging. Existing 6D pose estimation methods have therefore relied solely on synthetic data that lacks scene-level realism, leaving their performance under real agricultural field conditions unquantified. In this work, we present, to the best of our knowledge, the first real-world 6D pose ground truth dataset of strawberries collected in actual agricultural fields (12,040 images). We also introduce a synthetic dataset rendered in NVIDIA Isaac Sim, featuring scene-level realism and domain randomization. Nevertheless, our experiments reveal that a significant sim-to-real gap persists, underscoring the necessity of real agricultural field data for reliable evaluation. We further quantify the sim-to-real gap through baseline 6D pose estimation results across backbone encoders, serving as a reference for future work. The real-world dataset will be made available upon acceptance.