🤖 AI Summary
This work addresses the challenge of maintaining consistent orientation during robotic grasping of fruits and vegetables in post-harvest handling by proposing a geometry-based orientation-aware method centered on root structure analysis. The approach integrates YOLO-based root detection with RGB-D point cloud plane fitting to enable efficient orientation estimation without relying on vision-language models, subsequently generating stable grasp poses that satisfy orientation constraints along with Cartesian-space motion plans. Evaluated on both isolated and cluttered scenarios involving tomatoes and onions, the system achieves high grasping success rates and consistent execution times, significantly outperforming existing vision-language-action strategies and demonstrating the superior reliability and efficiency of the proposed geometry-driven framework.
📝 Abstract
Orientation-aware manipulation is essential in post-harvest agricultural processing, where produce must be grasped and placed in consistent configurations. This paper presents ROG-Grasp, a geometry-based robotic grasping and placement framework that estimates the produce orientation from root surface geometry using RGB-D perception. A YOLO-based root detector and point cloud plane fitting are used to infer the root normal, enabling stable grasp pose generation and orientation-constrained Cartesian motion planning. Experiments on tomatoes and onions demonstrate high success rates and stable execution time in both isolated and cluttered scenarios. Compared with vision-language-action (VLA) policies, the proposed method achieves more reliable and accurate grasp completion with faster execution. These results highlight the effectiveness of geometry-driven perception for practical orientation-controlled manipulation tasks. A video of our paper is available online https://youtu.be/Ir2UtGODdMo.