🤖 AI Summary
This paper addresses real-time robotic trajectory adjustment guided by natural language instructions. Methodologically, it introduces a training-free, geometry-aware framework: (1) a vision-language model parses scene objects into geometric primitives; (2) a large language model translates natural language instructions into explicit geometric and kinematic constraints; (3) potential-field optimization reshapes the initial trajectory safely, while a multi-agent coordination strategy resolves complex or conflicting instructions. Key contributions include: (1) the first zero-shot semantic-to-geometric constraint mapping mechanism; (2) simultaneous optimization of safety, trajectory smoothness, and interpretability; and (3) superior performance over state-of-the-art methods in both simulation and real-world robotic platforms—demonstrating dynamic adaptability, robustness, and human-understandable, language-driven trajectory replanning.
📝 Abstract
We present ZLATTE, a geometry-aware, learning-free framework for language-driven trajectory reshaping in human-robot interaction. Unlike prior learning-based methods, ZLATTE leverages Vision-Language Models to register objects as geometric primitives and employs a Large Language Model to translate natural language instructions into explicit geometric and kinematic constraints. These constraints are integrated into a potential field optimization to adapt initial trajectories while preserving feasibility and safety. A multi-agent strategy further enhances robustness under complex or conflicting commands. Simulation and real-world experiments demonstrate that ZLATTE achieves smoother, safer, and more interpretable trajectory modifications compared to state-of-the-art baselines.