Prompt-Guided Turn-Taking Prediction

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited dynamic controllability of turn-taking prediction in conversational systems. We propose an explicit rhythm control method grounded in textual prompting. Methodologically, we introduce editable natural-language instructions (e.g., “faster”, “more deliberate”) into a Voice Activity Projection (VAP) model based on the Transformer architecture, integrating prompt embeddings both intra-channel and cross-channel; high-quality synthetic prompts are further generated via large language models. Our key contributions are: (i) moving beyond conventional static turn-taking prediction to enable fine-grained, instruction-driven control over response timing; and (ii) demonstrating strong empirical performance—achieving significant gains in prediction accuracy on a 950-hour real human dialogue corpus—while exhibiting robust generalization and contextual adaptability.

Technology Category

Application Category

📝 Abstract
Turn-taking prediction models are essential components in spoken dialogue systems and conversational robots. Recent approaches leverage transformer-based architectures to predict speech activity continuously and in real-time. In this study, we propose a novel model that enables turn-taking prediction to be dynamically controlled via textual prompts. This approach allows intuitive and explicit control through instructions such as "faster" or "calmer" adapting dynamically to conversational partners and contexts. The proposed model builds upon a transformer-based voice activity projection (VAP) model, incorporating textual prompt embeddings into both channel-wise transformers and a cross-channel transformer. We evaluated the feasibility of our approach using over 950 hours of human-human spoken dialogue data. Since textual prompt data for the proposed approach was not available in existing datasets, we utilized a large language model (LLM) to generate synthetic prompt sentences. Experimental results demonstrated that the proposed model improved prediction accuracy and effectively varied turn-taking timing behaviors according to the textual prompts.
Problem

Research questions and friction points this paper is trying to address.

Predict turn-taking in dialogue systems dynamically
Control turn-taking via textual prompts like 'faster'
Improve accuracy and adaptability in conversational timing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic turn-taking control via textual prompts
Transformer-based VAP model with prompt embeddings
LLM-generated synthetic prompts for training
🔎 Similar Papers
No similar papers found.