A Generative System for Robot-to-Human Handovers: from Intent Inference to Spatial Configuration Imagery

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the unnatural human intent inference and spatial configuration generation in human-robot object handover. We propose the first generative handover system integrating motor imagery cognitive modeling. Methodologically, we introduce cognitive imagery modeling—pre-handover mental simulation of motion—into robotic handover tasks for the first time; combine vision-language multimodal intent understanding with diffusion-model-driven spatial configuration synthesis; and realize end-to-end simulation from “intending to hand over” to “how to hand over” under robot kinematic constraints. We further design a real-time intent decoding framework supporting fused RGB-D and speech perception. Experiments in real-world human-robot interaction demonstrate 92% intent recognition accuracy and 87% user-rated naturalness—significantly outperforming baselines—while exhibiting high fluency, interpretability, and environmental adaptability.

Technology Category

Application Category

📝 Abstract
We propose a novel system for robot-to-human object handover that emulates human coworker interactions. Unlike most existing studies that focus primarily on grasping strategies and motion planning, our system focus on 1. inferring human handover intents, 2. imagining spatial handover configuration. The first one integrates multimodal perception-combining visual and verbal cues-to infer human intent. The second one using a diffusion-based model to generate the handover configuration, involving the spacial relationship among robot's gripper, the object, and the human hand, thereby mimicking the cognitive process of motor imagery. Experimental results demonstrate that our approach effectively interprets human cues and achieves fluent, human-like handovers, offering a promising solution for collaborative robotics. Code, videos, and data are available at: https://i3handover.github.io.
Problem

Research questions and friction points this paper is trying to address.

Infer human handover intent using multimodal perception.
Generate spatial handover configuration via diffusion-based model.
Achieve fluent, human-like robot-to-human object handovers.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal perception for intent inference
Diffusion-based spatial configuration imagery
Human-like robot-to-human handover system
🔎 Similar Papers
No similar papers found.
H
Hanxin Zhang
DANiLab, School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
A
Abdulqader Dhafer
DANiLab, School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
Z
Zhou Daniel Hao
DANiLab, School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
Hongbiao Dong
Hongbiao Dong
University of Leicester
Modelling of solidificationdata-driven modellingcastingweldingthermal analysis