Everyday Speech in the Indian Subcontinent

📅 2024-10-14
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of multilingual text-to-speech (TTS) synthesis across 1,369 languages and 13 scripts in the Indian subcontinent, this work introduces the first end-to-end TTS framework tailored for everyday code-switching. Methodologically, we propose a phonology-based universal label set (CLS), enabling script- and language-agnostic input representation through phonology-driven text normalization and cross-lingual mapping—eliminating reliance on large, language-specific phoneme or word inventories. A single acoustic model trained on this unified representation supports seamless, speaker-consistent synthesis across 13 Indian languages and English within one voice. Experiments demonstrate state-of-the-art performance in naturalness (MOS), code-switching accuracy, and cross-lingual generalization, while significantly improving prosodic continuity and speaker identity preservation during intra-sentence language switching.

Technology Category

Application Category

📝 Abstract
India has 1369 languages of which 22 are official. About 13 different scripts are used to represent these languages. A Common Label Set (CLS) was developed based on phonetics to address the issue of large vocabulary of units required in the End-to-End (E2E) framework for multilingual synthesis. The Indian language text is first converted to CLS. This approach enables seamless code switching across 13 Indian languages and English in a given native speaker's voice, which corresponds to everyday speech in the Indian subcontinent, where the population is multilingual.
Problem

Research questions and friction points this paper is trying to address.

Develop CLS for multilingual synthesis
Enable code switching across languages
Address large vocabulary in E2E framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Common Label Set
multilingual synthesis
seamless code switching
🔎 Similar Papers
No similar papers found.
Utkarsh Pathak
Utkarsh Pathak
Research Scholar at Speech and Music Lab, IIT Madras
Text to speechZero-shot SpeechSpeech EnhancementIndicTTS
C
Chandra Sai Krishna Gunda
Dept. of Computer Science & Engg., Indian Institute of Technology Madras, Chennai, India
S
Sujitha Sathiyamoorthy
Dept. of Computer Science & Engg., Indian Institute of Technology Madras, Chennai, India
K
Keshav Agarwal
Dept. of Computer Science & Engg., Indian Institute of Technology Madras, Chennai, India
H
H. Murthy
Dept. of Computer Science & Engg., Indian Institute of Technology Madras and Shiv Nadar University, Chennai, India