Language Movement Primitives: Grounding Language Models in Robot Motion

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the Language Motion Primitives (LMP) framework to bridge the semantic gap between natural language instructions and low-level robot motion control. By integrating vision-language models (VLMs) with Dynamic Movement Primitives (DMPs) for the first time, LMP maps high-level task reasoning—guided by interpretable, few-parameter prompts—directly into the DMP parameter space, enabling the generation of continuous, stable, and diverse manipulation trajectories. The approach supports zero-shot generalization, allowing robots to execute tabletop tasks directly from natural language without task-specific training. Evaluated on 20 real-world tasks, LMP achieves an 80% success rate, substantially outperforming baseline methods (31%) and demonstrating strong effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract
Enabling robots to perform novel manipulation tasks from natural language instructions remains a fundamental challenge in robotics, despite significant progress in generalized problem solving with foundational models. Large vision and language models (VLMs) are capable of processing high-dimensional input data for visual scene and language understanding, as well as decomposing tasks into a sequence of logical steps; however, they struggle to ground those steps in embodied robot motion. On the other hand, robotics foundation models output action commands, but require in-domain fine-tuning or experience before they are able to perform novel tasks successfully. At its core, there still remains the fundamental challenge of connecting abstract task reasoning with low-level motion control. To address this disconnect, we propose Language Movement Primitives (LMPs), a framework that grounds VLM reasoning in Dynamic Movement Primitive (DMP) parameterization. Our key insight is that DMPs provide a small number of interpretable parameters, and VLMs can set these parameters to specify diverse, continuous, and stable trajectories. Put another way: VLMs can reason over free-form natural language task descriptions, and semantically ground their desired motions into DMPs -- bridging the gap between high-level task reasoning and low-level position and velocity control. Building on this combination of VLMs and DMPs, we formulate our LMP pipeline for zero-shot robot manipulation that effectively completes tabletop manipulation problems by generating a sequence of DMP motions. Across 20 real-world manipulation tasks, we show that LMP achieves 80% task success as compared to 31% for the best-performing baseline. See videos at our website: https://collab.me.vt.edu/lmp
Problem

Research questions and friction points this paper is trying to address.

language grounding
robot manipulation
motion control
task reasoning
embodied AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language Movement Primitives
Dynamic Movement Primitives
Vision-Language Models
Zero-shot Robot Manipulation
Motion Grounding
🔎 Similar Papers
No similar papers found.
Y
Yinlong Dai
Collab, Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
B
Benjamin A. Christie
Collab, Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
D
Daniel J. Evans
Collab, Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
D
Dylan P. Losey
Collab, Dept. of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
Simon Stepputtis
Simon Stepputtis
Virginia Tech
Artificial IntelligenceNatural Language ProcessingRoboticsHuman-Robot Interaction