π€ AI Summary
To address the longstanding trade-off between precision and efficiency in high-accuracy dexterous manipulation (e.g., needle threading), this work draws inspiration from human dual-resolution visuomotor control and introduces, for the first time, foveation-driven dual-resolution perception into robotic imitation learning. Methodologically, we integrate eye-tracking with region-adaptive image sampling to construct a two-branch convolutional network: a low-resolution peripheral branch for rapid coarse localization and a high-resolution foveal branch for sub-millimeter fine positioning. We employ behavior cloningβbased deep imitation learning for end-to-end policy learning. Evaluated on a general-purpose robotic arm, our approach achieves 0.3 mm positioning accuracy in needle threading, improves inference speed by 42%, and reduces computational overhead by 58%. The framework effectively decouples coarse and fine control stages, thereby simultaneously ensuring real-time performance and sub-millimeter precision.
π Abstract
A high-precision manipulation task, such as needle threading, is challenging. Physiological studies have proposed connecting low-resolution peripheral vision and fast movement to transport the hand into the vicinity of an object, and using high-resolution foveated vision to achieve the accurate homing of the hand to the object. The results of this study demonstrate that a deep imitation learning based method, inspired by the gaze-based dual resolution visuomotor control system in humans, can solve the needle threading task. First, we recorded the gaze movements of a human operator who was teleoperating a robot. Then, we used only a high-resolution image around the gaze to precisely control the thread position when it was close to the target. We used a low-resolution peripheral image to reach the vicinity of the target. The experimental results obtained in this study demonstrate that the proposed method enables precise manipulation tasks using a general-purpose robot manipulator and improves computational efficiency.