Few-Shot Vision-Language Action-Incremental Policy Learning

πŸ“… 2025-04-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses catastrophic forgetting in robotic manipulation under few-shot and continually emerging task settings. To this end, we propose Few-Shot Action Incremental Learning (FSAIL), a novel paradigm for incremental skill acquisition. Methodologically, we establish the first few-shot continual learning framework tailored for action skills, introducing a task prompt graph evolution mechanism that fuses text–vision multimodal cues. Our approach incorporates a Transformer-based multi-view spatial representation module, task-specific prompt (TSP) learning, continuous evolution strategy (CES), and a multimodal deep interaction module to enable cross-task skill reuse. Evaluated on standard benchmarks, FSAIL achieves over 26% improvement in success rate, significantly enhancing the few-shot adaptability and continual learning robustness of vision-language policies.

Technology Category

Application Category

πŸ“ Abstract
Recently, Transformer-based robotic manipulation methods utilize multi-view spatial representations and language instructions to learn robot motion trajectories by leveraging numerous robot demonstrations. However, the collection of robot data is extremely challenging, and existing methods lack the capability for continuous learning on new tasks with only a few demonstrations. In this paper, we formulate these challenges as the Few-Shot Action-Incremental Learning (FSAIL) task, and accordingly design a Task-prOmpt graPh evolutIon poliCy (TOPIC) to address these issues. Specifically, to address the data scarcity issue in robotic imitation learning, TOPIC learns Task-Specific Prompts (TSP) through the deep interaction of multi-modal information within few-shot demonstrations, thereby effectively extracting the task-specific discriminative information. On the other hand, to enhance the capability for continual learning on new tasks and mitigate the issue of catastrophic forgetting, TOPIC adopts a Continuous Evolution Strategy (CES). CES leverages the intrinsic relationships between tasks to construct a task relation graph, which effectively facilitates the adaptation of new tasks by reusing skills learned from previous tasks. TOPIC pioneers few-shot continual learning in the robotic manipulation task, and extensive experimental results demonstrate that TOPIC outperforms state-of-the-art baselines by over 26$%$ in success rate, significantly enhancing the continual learning capabilities of existing Transformer-based policies.
Problem

Research questions and friction points this paper is trying to address.

Addresses few-shot learning for robot tasks with minimal demonstrations
Enhances continual learning to prevent forgetting previous task skills
Improves robotic manipulation using multi-modal task-specific prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-Specific Prompts for few-shot learning
Continuous Evolution Strategy for task adaptation
Task relation graph for skill reuse
πŸ”Ž Similar Papers
No similar papers found.
M
Mingchen Song
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China, 518055
Xiang Deng
Xiang Deng
Scale AI
Machine LearningNLPKnowledge GraphsSemantic Web
Guoqiang Zhong
Guoqiang Zhong
Department of Computer Science and Technology, Ocean University of China
Machine Learning
Q
Qi Lv
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China, 518055
Jia Wan
Jia Wan
PhD student in EECS, MIT
statisticsreinforcement learninginferencecombinatorial optimization
Yinchuan Li
Yinchuan Li
Principal Researcher, Noah's Ark Lab
Generative ModelsEmbodied AIArtificial Intelligence
Jianye Hao
Jianye Hao
Huawei Noah's Ark Lab/Tianjin University
Multiagent SystemsEmbodied AI
W
Weili Guan
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China, 518055