Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the critical bottlenecks in language-conditioned robotic execution of long-horizon tasks: poor generalization, limited adaptability, and the absence of large-scale, task-specific datasets. To this end, we propose a data-agnostic vision-language closed-loop control framework. Methodologically, we design a dual-channel large language model (LLM) architecture: a *planner* performs chain-of-thought reasoning and temporal abstraction for hierarchical long-horizon task decomposition; a *reporter*, grounded in multimodal vision-language feedback, enables real-time state tracking and failure detection, triggering closed-loop re-planning. Crucially, the framework requires no task-specific training data and supports online planning, execution, and autonomous recovery from natural language instructions. Evaluated on both simulation and real-world robotic platforms, it achieves state-of-the-art performance, significantly improving zero-shot cross-task transfer capability and system robustness.

Technology Category

Application Category

📝 Abstract

Recent advances in language-conditioned robotic manipulation have leveraged imitation and reinforcement learning to enable robots to execute tasks from human commands. However, these methods often suffer from limited generalization, adaptability, and the lack of large-scale specialized datasets, unlike data-rich domains such as computer vision, making long-horizon task execution challenging. To address these gaps, we introduce DAHLIA, a data-agnostic framework for language-conditioned long-horizon robotic manipulation, leveraging large language models (LLMs) for real-time task planning and execution. DAHLIA employs a dual-tunnel architecture, where an LLM-powered planner collaborates with co-planners to decompose tasks and generate executable plans, while a reporter LLM provides closed-loop feedback, enabling adaptive re-planning and ensuring task recovery from potential failures. Moreover, DAHLIA integrates chain-of-thought (CoT) in task reasoning and temporal abstraction for efficient action execution, enhancing traceability and robustness. Our framework demonstrates state-of-the-art performance across diverse long-horizon tasks, achieving strong generalization in both simulated and real-world scenarios. Videos and code are available at https://ghiara.github.io/DAHLIA/.

Problem

Research questions and friction points this paper is trying to address.

Enables robots to execute long-horizon tasks from human commands

Addresses limited generalization and adaptability in robotic manipulation

Overcomes lack of large-scale datasets for language-conditioned tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered planner with co-planners for task decomposition

Reporter LLM provides closed-loop feedback for re-planning

Integrates chain-of-thought reasoning and temporal abstraction

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance