LangGrasp: Leveraging Fine-Tuned LLMs for Language Interactive Robot Grasping with Ambiguous Instructions

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Ambiguous natural language instructions often obscure the intended robotic grasping actions, leading to insufficient precision in component-level manipulation. Method: We propose a language-driven, component-level grasping framework that integrates a fine-tuned large language model (LLM) with 2D part segmentation–guided partial point cloud localization. The LLM decodes implicit semantic intent from instructions, while 2D segmentation provides part-level priors to guide fine-grained spatial localization in point clouds; an environment-aware fusion algorithm further dynamically generates high-accuracy grasping poses. Results: Experiments demonstrate that our framework accurately identifies key operations and target components from ambiguous instructions in unstructured environments, significantly improving component-level grasp success rates and task adaptability. It establishes a novel paradigm for language–action co-reasoning in embodied intelligence.

Technology Category

Application Category

📝 Abstract

The existing language-driven grasping methods struggle to fully handle ambiguous instructions containing implicit intents. To tackle this challenge, we propose LangGrasp, a novel language-interactive robotic grasping framework. The framework integrates fine-tuned large language models (LLMs) to leverage their robust commonsense understanding and environmental perception capabilities, thereby deducing implicit intents from linguistic instructions and clarifying task requirements along with target manipulation objects. Furthermore, our designed point cloud localization module, guided by 2D part segmentation, enables partial point cloud localization in scenes, thereby extending grasping operations from coarse-grained object-level to fine-grained part-level manipulation. Experimental results show that the LangGrasp framework accurately resolves implicit intents in ambiguous instructions, identifying critical operations and target information that are unstated yet essential for task completion. Additionally, it dynamically selects optimal grasping poses by integrating environmental information. This enables high-precision grasping from object-level to part-level manipulation, significantly enhancing the adaptability and task execution efficiency of robots in unstructured environments. More information and code are available here: https://github.com/wu467/LangGrasp.

Problem

Research questions and friction points this paper is trying to address.

Resolves implicit intents in ambiguous robotic grasping instructions

Enables fine-grained part-level manipulation via point cloud localization

Dynamically selects optimal grasping poses using environmental information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned LLMs deduce implicit intents from ambiguous instructions

Point cloud localization enables fine-grained part-level grasping

Dynamic grasping pose selection integrates environmental information

🔎 Similar Papers

A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping