🤖 AI Summary
Existing turn-taking detection models for full-duplex speech interaction suffer from insufficient robustness—either being closed-source and parameter-heavy, or supporting only unimodal (acoustic or linguistic) inputs; moreover, LLM-based approaches rely on scarce, fully annotated full-duplex data. Method: We propose an open-source, lightweight, modular bimodal turn-taking detection model that jointly leverages acoustic features and linguistic representations to classify four states: “speaking,” “listening,” “silence,” and “backchannel tokens.” Contribution/Results: We release Easy Turn—the first large-scale open dataset for full-duplex turn-taking (1,145 hours for training plus a curated test set). On the Easy Turn testset, our model significantly outperforms open-source baselines including TEN and Smart Turn V2, achieving state-of-the-art accuracy. This work establishes a reproducible, scalable foundation for natural human–machine voice interaction.
📝 Abstract
Full-duplex interaction is crucial for natural human-machine communication, yet remains challenging as it requires robust turn-taking detection to decide when the system should speak, listen, or remain silent. Existing solutions either rely on dedicated turn-taking models, most of which are not open-sourced. The few available ones are limited by their large parameter size or by supporting only a single modality, such as acoustic or linguistic. Alternatively, some approaches finetune LLM backbones to enable full-duplex capability, but this requires large amounts of full-duplex data, which remain scarce in open-source form. To address these issues, we propose Easy Turn, an open-source, modular turn-taking detection model that integrates acoustic and linguistic bimodal information to predict four dialogue turn states: complete, incomplete, backchannel, and wait, accompanied by the release of Easy Turn trainset, a 1,145-hour speech dataset designed for training turn-taking detection models. Compared to existing open-source models like TEN Turn Detection and Smart Turn V2, our model achieves state-of-the-art turn-taking detection accuracy on our open-source Easy Turn testset. The data and model will be made publicly available on GitHub.