Multi-task Learning is Not Enough: Representational Entanglement in Dual-output Second Language Speech Recognition

๐Ÿ“… 2026-06-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

181K/year
๐Ÿค– AI Summary
This study addresses the performance degradation in surface transcription commonly observed in dual-output multilingual speech recognition, particularly for English, where surface forms diverge markedly from semantic representations. By comparing Korean and English within a dual-output architecture, the work identifies entangled encoder representations as the primary cause of this degradation: Korean maintains task-specific representations, whereas English exhibits highly entangled ones. The authors quantify surfaceโ€“semantic divergence using Levenshtein distance and analyze both encoder and decoder representations. Their findings reveal that the semantic decoder demonstrates strong adaptability, while the surface decoder is heavily constrained by the shared encoder, challenging the conventional assumption that shared representations are inherently beneficial. These insights offer new directions for designing effective multitask speech recognition systems.
๐Ÿ“ Abstract
Second-language (L2) speech recognition often requires transcriptions of pronunciations and intended meanings. Multi-task learning (MTL) is a natural approach because it assumes that shared representations benefit both outputs. However, this paper shows that this assumption does not hold across Korean and English. MTL improves meaning but degrades surface transcription, especially in English, where the degradation scales with surface-meaning divergence measured by Levenshtein edit distance.Encoder analysis links these patterns to encoder-level entanglement, with Korean preserving distinct task representations while English produces nearly identical ones. Cross-task decoder analysis shows that the meaning dual-output decoder adapts with a unique representation, while the surface dual-output decoder remains constrained by the encoder. These findings motivate the design of MTL frameworks that mitigate encoder-level entanglement to reduce surface degradation in dual-output L2 automatic speech recognition.
Problem

Research questions and friction points this paper is trying to address.

multi-task learning
representational entanglement
second language speech recognition
dual-output ASR
surface transcription degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task learning
representational entanglement
dual-output speech recognition
second-language ASR
encoder analysis
๐Ÿ”Ž Similar Papers
No similar papers found.