Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of automatic speech recognition (ASR) for Taiwanese Hakka, a low-resource endangered language characterized by substantial dialectal variation and dual orthographic systems—Han characters and romanized phonetic scripts. The authors propose a unified RNN-T–based framework that disentangles linguistic content from dialectal style through dialect-aware modeling and introduces a parameter-efficient prediction network to jointly optimize recognition across both writing systems. Cross-system mutual regularization is further incorporated to enhance model generalization. This study presents the first systematic analysis of how Hakka dialectal variation impacts ASR performance and establishes the first single-model architecture capable of jointly recognizing both Han characters and romanized transcriptions. Evaluated on the HAT corpus, the proposed approach achieves relative word error rate reductions of 57.00% and 40.41% for the two respective orthographies.

Technology Category

Application Category

📝 Abstract
Taiwanese Hakka is a low-resource, endangered language that poses significant challenges for automatic speech recognition (ASR), including high dialectal variability and the presence of two distinct writing systems (Hanzi and Pinyin). Traditional ASR models often encounter difficulties in this context, as they tend to conflate essential linguistic content with dialect-specific variations across both phonological and lexical dimensions. To address these challenges, we propose a unified framework grounded in the Recurrent Neural Network Transducers (RNN-T). Central to our approach is the introduction of dialect-aware modeling strategies designed to disentangle dialectal "style" from linguistic "content", which enhances the model's capacity to learn robust and generalized representations. Additionally, the framework employs parameter-efficient prediction networks to concurrently model ASR (Hanzi and Pinyin). We demonstrate that these tasks create a powerful synergy, wherein the cross-script objective serves as a mutual regularizer to improve the primary ASR tasks. Experiments conducted on the HAT corpus reveal that our model achieves 57.00% and 40.41% relative error rate reduction on Hanzi and Pinyin ASR, respectively. To our knowledge, this is the first systematic investigation into the impact of Hakka dialectal variations on ASR and the first single model capable of jointly addressing these tasks.
Problem

Research questions and friction points this paper is trying to address.

low-resource
dialectal variability
automatic speech recognition
Taiwanese Hakka
writing systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

dialect-aware modeling
RNN-T
low-resource ASR
cross-script regularization
Taiwanese Hakka
🔎 Similar Papers
No similar papers found.
A
An-Ci Peng
Dept. Computer Science and Information Engineering, National Taiwan Normal University, Taiwan
K
Kuan-Tang Huang
Dept. Computer Science and Information Engineering, National Taiwan Normal University, Taiwan
T
Tien-Hong Lo
Dept. Computer Science and Information Engineering, National Taiwan Normal University, Taiwan
Hung-Shin Lee
Hung-Shin Lee
North Co., Ltd., Taiwan
Speech Processing
Hsin-Min Wang
Hsin-Min Wang
Research Fellow/Professor, Institute of Information Sience, Academia Sinica
Spoken Language ProcessingNatural Language ProcessingMultimedia Information RetrievalMachine Learning
Berlin Chen
Berlin Chen
Professor of Computer Science and Information Engineering, National Taiwan Normal University
speech and natural language processingcomputer-assisted language learningmachine learning