SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address semantic inconsistency and poor interpretability in speech-driven gesture generation, this paper proposes a novel framework integrating gesture behavior graphs with intent-chain reasoning. Methodologically, we introduce the first LLM-driven, structured intent-chain reasoning paradigm, which decomposes speech intents into multi-step semantic units and maps them onto graph-supported gesture labels. We further construct a lightweight intent-chain annotation dataset and a dedicated label generation model to achieve precise text-to-gesture semantic alignment. Experimental results demonstrate a gesture semantic alignment accuracy of 50.2% and an average inference latency of only 0.4 seconds per sample. The framework maintains high-fidelity co-synthesis of speech and gestures while substantially enhancing output credibility and interpretability—enabling transparent, stepwise intent-to-gesture mapping grounded in both linguistic semantics and gestural ontology.

Technology Category

Application Category

📝 Abstract

Co-speech gesture generation enhances human-computer interaction realism through speech-synchronized gesture synthesis. However, generating semantically meaningful gestures remains a challenging problem. We propose SARGes, a novel framework that leverages large language models (LLMs) to parse speech content and generate reliable semantic gesture labels, which subsequently guide the synthesis of meaningful co-speech gestures.First, we constructed a comprehensive co-speech gesture ethogram and developed an LLM-based intent chain reasoning mechanism that systematically parses and decomposes gesture semantics into structured inference steps following ethogram criteria, effectively guiding LLMs to generate context-aware gesture labels. Subsequently, we constructed an intent chain-annotated text-to-gesture label dataset and trained a lightweight gesture label generation model, which then guides the generation of credible and semantically coherent co-speech gestures. Experimental results demonstrate that SARGes achieves highly semantically-aligned gesture labeling (50.2% accuracy) with efficient single-pass inference (0.4 seconds). The proposed method provides an interpretable intent reasoning pathway for semantic gesture synthesis.

Problem

Research questions and friction points this paper is trying to address.

Generating semantically meaningful co-speech gestures from speech

Parsing speech content to produce reliable gesture labels

Ensuring gesture synthesis aligns with contextual intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based intent chain reasoning mechanism

Lightweight gesture label generation model

Interpretable intent reasoning pathway

🔎 Similar Papers

No similar papers found.

Authors to Follow