GlobalizeEd: A Multimodal Translation System that Preserves Speaker Identity in Academic Lectures

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-lingual dissemination of academic lectures faces dual challenges: existing automated captioning tools struggle to accurately convey discipline-specific semantics and pedagogical prosody, while text-only output exacerbates cognitive load and undermines learning immersion. To address this, we propose the first human-centered multimodal translation framework tailored for educational settings. Our approach innovatively integrates speaker-identity-preserving speech synthesis, culturally adaptive multimodal translation, and dynamic cognitive load assessment—retaining original speakers’ acoustic characteristics (e.g., timbre, prosody) in translated speech to jointly ensure linguistic fidelity, cultural appropriateness, and user controllability. Mixed-method evaluation demonstrates statistically significant reductions in cognitive load (p < 0.01), alongside substantial improvements in comprehension accuracy (+23.6%) and immersion (+31.2%). The system also receives high ratings from instructors and students on speech naturalness, translation accuracy, and speaker identity consistency.

Technology Category

Application Category

📝 Abstract
A large amount of valuable academic content is only available in its original language, creating a significant access barrier for the global student community. This is a challenge for translating in several subjects, such as history, culture, and the arts, where current automated subtitle tools fail to convey the appropriate pedagogical tone and specialized meaning. In addition, reading traditional automated subtitles increases cognitive load and leads to a disconnected learning experience. Through a mixed-methods study involving 36 participants, we found that GlobalizeEds dubbed formats significantly reduce cognitive load and offer a more immersive learning experience compared to traditional subtitles. Although learning effectiveness was comparable between high-quality subtitles and dubbed formats, both groups valued GlobalizeEds ability to preserve the speakers voice, which enhanced perceived authenticity. Instructors rated translation accuracy and vocal naturalness, whereas students reported that synchronized, identity-preserving outputs fostered engagement and trust. This work contributes a novel human-centered AI framework for cross-lingual education, demonstrating how multimodal translation systems can balance linguistic fidelity, cultural adaptability, and user control to create more inclusive global learning experiences.
Problem

Research questions and friction points this paper is trying to address.

Translating academic lectures while preserving original speaker identity and vocal characteristics
Reducing cognitive load caused by traditional subtitles in cross-lingual learning
Maintaining pedagogical tone and cultural authenticity in specialized subject translations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multimodal translation system for academic lectures
Preserves speaker identity and voice in translation
Balances linguistic fidelity with cultural adaptability
🔎 Similar Papers
No similar papers found.
Hoang-Son Vo
Hoang-Son Vo
AI Convergence, Chonnam National University
Computer VisionMedical Image ProcessingImage Generation3D Image
K
Karina Kolmogortseva
Chonnam National University, Republic of Korea
Ngumimi Karen Iyortsuun
Ngumimi Karen Iyortsuun
Chonnam National University, Republic of Korea
H
Hong-Duyen Vo
FPT University, Viet Nam
S
Soo-Hyung Kim
Chonnam National University, Republic of Korea