Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

πŸ“… 2025-09-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the problem of inaccurate and unnatural gesturing by embodied agents (e.g., robots) during natural deictic communication with humans in physical environments. We propose a gesture generation framework that integrates imitation learning (IL) with hierarchical reinforcement learning (HRL), jointly modeling motion control policies and referential semantics using only small-scale motion-capture dataβ€”thus ensuring both physical plausibility and deictic precision. In human-subject evaluations within a virtual reality deictic task, our method significantly outperforms purely supervised baselines, improving referential accuracy by +12.3% and achieving statistically significant gains in perceived naturalness (p < 0.01). It has been successfully deployed on a real robotic platform, demonstrating practical viability. The core contribution is the first integration of hierarchical RL into an IL pipeline, enabling high-fidelity deictic gesture synthesis under low-data regimes.

Technology Category

Application Category

πŸ“ Abstract
One of the main goals of robotics and intelligent agent research is to enable natural communication with humans in physically situated settings. While recent work has focused on verbal modes such as language and speech, non-verbal communication is crucial for flexible interaction. We present a framework for generating pointing gestures in embodied agents by combining imitation and reinforcement learning. Using a small motion capture dataset, our method learns a motor control policy that produces physically valid, naturalistic gestures with high referential accuracy. We evaluate the approach against supervised learning and retrieval baselines in both objective metrics and a virtual reality referential game with human users. Results show that our system achieves higher naturalness and accuracy than state-of-the-art supervised models, highlighting the promise of imitation-RL for communicative gesture generation and its potential application to robots.
Problem

Research questions and friction points this paper is trying to address.

Generating natural pointing gestures for embodied agents
Combining imitation and reinforcement learning for gesture control
Improving referential accuracy and naturalness in non-verbal communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining imitation and reinforcement learning
Learning motor control policy for gestures
Producing physically valid naturalistic gestures
πŸ”Ž Similar Papers
No similar papers found.
A
Anna Deichler
Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
S
Siyang Wang
Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
Simon Alexanderson
Simon Alexanderson
KTH Royal Institute of Technology
motion synthesisnon-verbal communicationmotion capture
Jonas Beskow
Jonas Beskow
Professor, KTH Speech, Music and Hearing
multimodal interactionsocial roboticsspeech synthesismotion synthesissign language processing