🤖 AI Summary
This work addresses the dexterous object handover task between dual robotic arms in human-robot collaboration scenarios. To overcome large rotational errors and poor generalization inherent in conventional rotation representations, we propose a reinforcement learning framework grounded in dual quaternions. A novel reward function is designed to explicitly encode pose-coupling constraints, significantly reducing rotational distance error. By integrating multi-finger hand dynamics modeling and cross-distribution training, the policy achieves enhanced robustness against unseen objects and motion disturbances from collaborative robots. Experiments demonstrate a 94% success rate on the standard test set and only a 13.8% performance drop under dynamic perturbations—validating strong generalization and robustness. The core contribution lies in the first application of dual quaternions to RL reward design for dexterous handover, unifying high-precision pose control with cross-task transferability.
📝 Abstract
Object handover is an important skill that we use daily when interacting with other humans. To deploy robots in collaborative setting, like houses, being able to receive and handing over objects safely and efficiently becomes a crucial skill. In this work, we demonstrate the use of Reinforcement Learning (RL) for dexterous object handover between two multi-finger hands. Key to this task is the use of a novel reward function based on dual quaternions to minimize the rotation distance, which outperforms other rotation representations such as Euler and rotation matrices. The robustness of the trained policy is experimentally evaluated by testing w.r.t. objects that are not included in the training distribution, and perturbations during the handover process. The results demonstrate that the trained policy successfully perform this task, achieving a total success rate of 94% in the best-case scenario after 100 experiments, thereby showing the robustness of our policy with novel objects. In addition, the best-case performance of the policy decreases by only 13.8% when the other robot moves during the handover, proving that our policy is also robust to this type of perturbation, which is common in real-world object handovers.