MultiAiTutor: Child-Friendly Educational Multilingual Speech Generation Tutor with LLMs

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-quality speech synthesis and poor cultural adaptation hinder multilingual children’s education in low-resource language settings. Method: We propose the first child-centered multilingual text-to-speech (TTS) framework supporting under-resourced languages—including Singaporean Mandarin, Malay, and Tamil—by integrating large language models (LLMs) with multilingual TTS. To enhance cultural relevance, we introduce a culturally aware image captioning task to guide content generation; we further incorporate age-appropriate linguistic modeling and a dual-dimensional evaluation protocol combining objective metrics (e.g., MOS, WER) with child-in-the-loop subjective feedback. Contribution/Results: Experiments demonstrate significant improvements over baselines in speech naturalness, cultural appropriateness, and child comprehension. The framework effectively boosts learning engagement and second-language acquisition outcomes in real-world educational contexts.

Technology Category

Application Category

📝 Abstract
Generative speech models have demonstrated significant potential in personalizing teacher-student interactions, offering valuable real-world applications for language learning in children's education. However, achieving high-quality, child-friendly speech generation remains challenging, particularly for low-resource languages across diverse languages and cultural contexts. In this paper, we propose MultiAiTutor, an educational multilingual generative AI tutor with child-friendly designs, leveraging LLM architecture for speech generation tailored for educational purposes. We propose to integrate age-appropriate multilingual speech generation using LLM architectures, facilitating young children's language learning through culturally relevant image-description tasks in three low-resource languages: Singaporean-accent Mandarin, Malay, and Tamil. Experimental results from both objective metrics and subjective evaluations demonstrate the superior performance of the proposed MultiAiTutor compared to baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Child-friendly multilingual speech generation for education
High-quality speech synthesis for low-resource languages
Culturally relevant language learning for young children
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual child-friendly speech generation using LLMs
Age-appropriate educational content in low-resource languages
Culturally relevant image-description tasks for learning
🔎 Similar Papers
No similar papers found.