🤖 AI Summary
Cross-lingual dissemination of academic lectures faces dual challenges: existing automated captioning tools struggle to accurately convey discipline-specific semantics and pedagogical prosody, while text-only output exacerbates cognitive load and undermines learning immersion. To address this, we propose the first human-centered multimodal translation framework tailored for educational settings. Our approach innovatively integrates speaker-identity-preserving speech synthesis, culturally adaptive multimodal translation, and dynamic cognitive load assessment—retaining original speakers’ acoustic characteristics (e.g., timbre, prosody) in translated speech to jointly ensure linguistic fidelity, cultural appropriateness, and user controllability. Mixed-method evaluation demonstrates statistically significant reductions in cognitive load (p < 0.01), alongside substantial improvements in comprehension accuracy (+23.6%) and immersion (+31.2%). The system also receives high ratings from instructors and students on speech naturalness, translation accuracy, and speaker identity consistency.
📝 Abstract
A large amount of valuable academic content is only available in its original language, creating a significant access barrier for the global student community. This is a challenge for translating in several subjects, such as history, culture, and the arts, where current automated subtitle tools fail to convey the appropriate pedagogical tone and specialized meaning. In addition, reading traditional automated subtitles increases cognitive load and leads to a disconnected learning experience. Through a mixed-methods study involving 36 participants, we found that GlobalizeEds dubbed formats significantly reduce cognitive load and offer a more immersive learning experience compared to traditional subtitles. Although learning effectiveness was comparable between high-quality subtitles and dubbed formats, both groups valued GlobalizeEds ability to preserve the speakers voice, which enhanced perceived authenticity. Instructors rated translation accuracy and vocal naturalness, whereas students reported that synchronized, identity-preserving outputs fostered engagement and trust. This work contributes a novel human-centered AI framework for cross-lingual education, demonstrating how multimodal translation systems can balance linguistic fidelity, cultural adaptability, and user control to create more inclusive global learning experiences.