🤖 AI Summary
Addressing the challenges of high task diversity and weak fault-recovery robustness in long-horizon, multi-stage robotic manipulation tasks, this paper proposes ROMAN, a hybrid hierarchical learning framework. ROMAN dynamically schedules reconfigurable, task-specialized expert networks to enable modular task orchestration and autonomous failure recovery. Methodologically, it innovatively unifies behavior cloning, inverse reinforcement learning, and policy-gradient-based reinforcement learning to support plug-and-play integration of subtasks. By combining multi-expert ensembling with hierarchical action-sequence modeling, ROMAN significantly improves stability under sensory noise and generalization across unseen task configurations. Experimental results demonstrate substantial improvements in success rates for fine-grained manipulation tasks, robust performance under visual and proprioceptive noise, generalization to operation sequences beyond demonstration scope, and an autonomous failure recovery rate of 92.7%.
📝 Abstract
Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.