Transformer-based deep imitation learning for dual-arm robot manipulation

📅 2021-08-01
🏛️ IEEE/RJS International Conference on Intelligent RObots and Systems
📈 Citations: 48
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional state inputs in bimanual robotic dexterous manipulation induce neural network interference and degrade policy performance. Method: This paper proposes a Transformer-based deep imitation learning framework tailored for real-world deployment. It introduces self-attention mechanisms into bimanual robotic imitation learning for the first time, enabling cross-arm state dependency modeling and dynamic focus on salient perceptual features. The approach fuses multimodal sensor time-series inputs with learnable self-attention weights, eliminating the need for explicit environment modeling or pre-programmed behaviors. Contribution/Results: Evaluated on a physical bimanual robot platform, the method significantly suppresses input interference, achieving a 23.5% absolute improvement in task success rate over attention-free baselines. It establishes a scalable, end-to-end perception–decision paradigm for multi-arm collaborative learning.
📝 Abstract
Deep imitation learning is promising for solving dexterous manipulation tasks because it does not require an environment model and pre-programmed robot behavior. However, its application to dual-arm manipulation tasks remains challenging. In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions and results in poor performance of the neural networks. We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements. A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world. The proposed method has been tested on dual-arm manipulation tasks using a real robot. The experimental results demonstrated that the Transformer-based deep imitation learning architecture can attend to the important features among the sensory inputs, therefore reducing distractions and improving manipulation performance when compared with the baseline architecture without the self-attention mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Addressing poor performance in dual-arm robot manipulation tasks
Reducing distractions from high-dimensional state spaces
Improving feature attention in deep imitation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based deep imitation learning for robots
Self-attention mechanism reduces sensory distractions
Dual-arm manipulation improved with attention architecture
🔎 Similar Papers
No similar papers found.
Heecheol Kim
Heecheol Kim
Laboratory for Intelligent Systems and Informatics, Graduate School of Information Science and Technology, The University of Tokyo
Y
Y. Ohmura
Laboratory for Intelligent Systems and Informatics, Graduate School of Information Science and Technology, The University of Tokyo
Y
Y. Kuniyoshi
Laboratory for Intelligent Systems and Informatics, Graduate School of Information Science and Technology, The University of Tokyo