A Backbone for Long-Horizon Robot Task Understanding

📅 2024-08-02
🏛️ IEEE Robotics and Automation Letters
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address poor interpretability, weak generalization, and unpredictable outcomes of end-to-end learning in long-horizon robotic tasks, this paper proposes the Therblig-Based Backbone Framework (TBBF). First, it introduces a novel task decomposition paradigm grounded in Therblig motion primitives, enabling semantically interpretable temporal disentanglement. Second, it designs the Meta-RGate SynerFusion (MGSF) network and an Action Registration module to support one-shot generalization. Third, it integrates an LLM-Aligned Policy for Visual Correction (LAP-VC), which leverages vision-language feedback for online behavioral refinement. Experiments demonstrate a Therblig segmentation recall of 94.37%. On real robots, TBBF achieves task success rates of 94.4% in simple scenes and 80% in complex scenes—substantially improving robustness and cross-task generalization over prior end-to-end approaches.

Technology Category

Application Category

📝 Abstract
End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-Based Backbone Framework (TBBF) as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.
Problem

Research questions and friction points this paper is trying to address.

Improves interpretability and generalization in robot learning
Enhances data efficiency for long-horizon robotic tasks
Facilitates adaptive trajectory generation for new scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Therblig-Based Backbone Framework enhances robot task understanding
Meta-RGate SynerFusion network for accurate task segmentation
LLM-Alignment Policy ensures precise action registration
🔎 Similar Papers
No similar papers found.
X
Xiaoshuai Chen
Department of Dyson School of Engineering, Imperial College London, London, UK
W
Wei Chen
Department of Dyson School of Engineering, Imperial College London, London, UK
Dongmyoung Lee
Dongmyoung Lee
Postdoctoral Researcher, TU Wien
RoboticsRobot ManipulationGripper Design and Control
Y
Yukun Ge
Department of Dyson School of Engineering, Imperial College London, London, UK
N
Nicolás Rojas
The AI Institute, Cambridge, MA, USA
Petar Kormushev
Petar Kormushev
Imperial College London, Director of Robot Intelligence Lab
Robot IntelligenceRobot LearningReinforcement LearningMachine LearningRobotics