A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low human-robot collaboration efficiency and poor stability in dynamic warehouse environments for shelf picking, this paper proposes a multimodal, physics-aware, and task-decomposed collaborative framework. Methodologically, it introduces a novel LLM-driven Chain-of-Thought reasoning mechanism jointly optimized with rigid-body physics simulation, integrated with a relational graph to enable subtask generation and sequential planning. The system incorporates RGB-D and speech perception, gesture recognition, spoken language understanding, and audiovisual feedback to ensure safe bin grasping and real-time collaborative response. Evaluated in realistic warehouse settings across three experiments, the framework achieves a 37% improvement in task success rate and reduces end-to-end human-robot collaboration latency to under 1.2 seconds—significantly overcoming stability bottlenecks in cluttered, dynamically changing environments.

Technology Category

Application Category

📝 Abstract
The growing presence of service robots in human-centric environments, such as warehouses, demands seamless and intuitive human-robot collaboration. In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making. Furthermore, we validate the framework through real-world shelf picking experiments such as 1) Gesture-Guided Box Extraction, 2) Collaborative Shelf Clearing and 3) Collaborative Stability Assistance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing human-robot teamwork through multimodal interaction
Safely retrieving cluttered boxes using physics-based reasoning
Validating framework with real-world shelf picking tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal interaction for human-robot teamwork
LLM with CoT and physics-based reasoning
Real-world experiments validate framework
🔎 Similar Papers
No similar papers found.