Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing task and motion planning (TAMP) approaches suffer from prohibitive computational overhead in long-horizon tasks due to excessive motion sampling; while large language models (LLMs) encode commonsense priors, they lack 3D geometric and dynamical reasoning capabilities. This paper proposes a Vision-Language Model (VLM)-driven TAMP framework that unifies symbolic task states and continuous motion states within a hybrid state tree. Crucially, it tightly couples VLM-based visual reasoning with dynamical validation during search—via VLM-guided sampling, interleaved search strategies, joint verification using off-the-shelf motion planners and physics simulators, and visual rendering of intermediate states to refine search direction. Evaluated in simulation and real-world settings, our method improves task success rates by 32.14%–1166.67% over traditional and LLM-based baselines, while substantially reducing planning time. Ablation studies confirm the critical role of VLM guidance in enhancing both efficiency and solution feasibility.

Technology Category

Application Category

📝 Abstract

Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM guidance.

Problem

Research questions and friction points this paper is trying to address.

Reduces excessive motion sampling in long-horizon TAMP problems

Integrates symbolic and numeric states for joint task-motion decisions

Ensures kinodynamic feasibility using motion planner and physics simulator

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid state tree integrates symbolic and numeric states

VLM guides task planning with visual state rendering

Kinodynamic constraints verified by motion planner and simulator

🔎 Similar Papers

Kinodynamic Motion Planning for Collaborative Object Transportation by Multiple Mobile Manipulators