🤖 AI Summary
Strawberry clusters cause severe occlusion, making it challenging for conventional robotic harvesting systems to accurately localize stem-cutting points and perform dexterous obstacle avoidance amid dense vegetation. To address this, we propose a vision–motor coordinated harvesting method based on human demonstration learning. Our approach introduces an end-effector pose-augmented action chunking Transformer model that decouples the harvesting motion into two phases—pose-guided global positioning and local manipulation—enhancing generalization and robustness under occlusion. A 4-DOF SCARA manipulator with master–slave teleoperation is employed to collect high-fidelity demonstration data, enabling end-to-end visuomotor policy learning. Experiments demonstrate a 23.6% improvement in harvesting success rate over standard Action Chunking with Transformers (ACT) under multi-level occlusion, and strong adaptability to soft-branch perturbations. This work provides a scalable technical pathway for delicate fruit harvesting in complex agricultural environments.
📝 Abstract
Strawberries naturally grow in clusters, interwoven with leaves, stems, and other fruits, which frequently leads to occlusion. This inherent growth habit presents a significant challenge for robotic picking, as traditional percept-plan-control systems struggle to reach fruits amid the clutter. Effectively picking an occluded strawberry demands dexterous manipulation to carefully bypass or gently move the surrounding soft objects and precisely access the ideal picking point located at the stem just above the calyx. To address this challenge, we introduce a strawberry-picking robotic system that learns from human demonstrations. Our system features a 4-DoF SCARA arm paired with a human teleoperation interface for efficient data collection and leverages an End Pose Assisted Action Chunking Transformer (ACT) to develop a fine-grained visuomotor picking policy. Experiments under various occlusion scenarios demonstrate that our modified approach significantly outperforms the direct implementation of ACT, underscoring its potential for practical application in occluded strawberry picking.