Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

πŸ“… 2024-11-11
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the lack of speech large language models (SLMs) tailored to Taiwanese Mandarin, this paper introduces the first end-to-end real-time spoken dialogue model specifically designed for Taiwanese Guoyu. Methodologically, we adopt a decoder-only Transformer architecture, construct a high-quality synthetic spoken dialogue dataset, design a low-latency full-duplex speech interaction mechanism, and establish a multi-turn coherence evaluation framework. Our key contributions are: (1) the first full-duplex speech interaction modeling for Taiwanese Guoyu; and (2) a novel training paradigm and evaluation protocol explicitly designed for real-time spoken dialogue. Experimental results demonstrate that the prototype system supports natural, fluent multi-turn voice conversations, achieving sub-300ms response latency and validated semantic coherence. This work provides a reproducible technical pathway for developing dialectal Chinese speech LLMs.

Technology Category

Application Category

πŸ“ Abstract
This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.
Problem

Research questions and friction points this paper is trying to address.

Real-time Speech Communication
Taiwanese Mandarin
Natural Conversation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Taiwanese Mandarin
Real-time Speech Interaction
Virtual Dialogue Data Training
πŸ”Ž Similar Papers
No similar papers found.