Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the performance degradation in surface transcription commonly observed in dual-output multilingual speech recognition, particularly for English, where surface forms diverge markedly from semantic representations. By comparing Korean and English within a dual-output architecture, the work identifies entangled encoder representations as the primary cause of this degradation: Korean maintains task-specific representations, whereas English exhibits highly entangled ones. The authors quantify surface–semantic divergence using Levenshtein distance and analyze both encoder and decoder representations. Their findings reveal that the semantic decoder demonstrates strong adaptability, while the surface decoder is heavily constrained by the shared encoder, challenging the conventional assumption that shared representations are inherently beneficial. These insights offer new directions for designing effective multitask speech recognition systems.

📝 Abstract

Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance.Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.

Problem

Research questions and friction points this paper is trying to address.

multi-task learning

representational entanglement

second language speech recognition

dual-output ASR

surface transcription degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task learning

representational entanglement

dual-output speech recognition