A Generative System for Robot-to-Human Handovers: from Intent Inference to Spatial Configuration Imagery

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the unnatural human intent inference and spatial configuration generation in human-robot object handover. We propose the first generative handover system integrating motor imagery cognitive modeling. Methodologically, we introduce cognitive imagery modeling—pre-handover mental simulation of motion—into robotic handover tasks for the first time; combine vision-language multimodal intent understanding with diffusion-model-driven spatial configuration synthesis; and realize end-to-end simulation from “intending to hand over” to “how to hand over” under robot kinematic constraints. We further design a real-time intent decoding framework supporting fused RGB-D and speech perception. Experiments in real-world human-robot interaction demonstrate 92% intent recognition accuracy and 87% user-rated naturalness—significantly outperforming baselines—while exhibiting high fluency, interpretability, and environmental adaptability.

Technology Category

Application Category

📝 Abstract

We propose a novel system for robot-to-human object handover that emulates human coworker interactions. Unlike most existing studies that focus primarily on grasping strategies and motion planning, our system focus on 1. inferring human handover intents, 2. imagining spatial handover configuration. The first one integrates multimodal perception-combining visual and verbal cues-to infer human intent. The second one using a diffusion-based model to generate the handover configuration, involving the spacial relationship among robot's gripper, the object, and the human hand, thereby mimicking the cognitive process of motor imagery. Experimental results demonstrate that our approach effectively interprets human cues and achieves fluent, human-like handovers, offering a promising solution for collaborative robotics. Code, videos, and data are available at: https://i3handover.github.io.

Problem

Research questions and friction points this paper is trying to address.

Infer human handover intent using multimodal perception.

Generate spatial handover configuration via diffusion-based model.

Achieve fluent, human-like robot-to-human object handovers.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal perception for intent inference

Diffusion-based spatial configuration imagery

Human-like robot-to-human handover system

🔎 Similar Papers

No similar papers found.