Robotic Policy Adaptation via Weight-Space Meta-Learning

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high cost and limited scalability of current vision–language–action (VLA) models, which typically require task-specific demonstrations, action annotations, and fine-tuning for new tasks. The authors propose WIZARD, a novel framework that, for the first time, leverages meta-learning in weight space to generate task-specific LoRA parameters in a single forward pass using only a language instruction and a short demonstration video—without requiring action labels or test-time optimization. By combining a frozen VLA backbone, LoRA adapters, and a video–language alignment mechanism, WIZARD directly maps task evidence to expert LoRA updates, enabling zero-shot task adaptation. On the LIBERO benchmark, it achieves up to 2× and 14× performance gains on unseen datasets and tasks, respectively, and significantly outperforms real-world fine-tuned baselines on a Franka robot.

📝 Abstract

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.

Problem

Research questions and friction points this paper is trying to address.

robotic policy adaptation

Vision-Language-Action models

task-specific fine-tuning

demonstration-efficient learning

zero-shot adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

weight-space meta-learning

Vision-Language-Action (VLA)

LoRA adaptation