🤖 AI Summary
This study investigates the quantitative relationship between accent strength and articulatory features—particularly tongue position—implicitly encoded in acoustic speech signals. We propose a self-supervised articulatory inversion framework that jointly models phoneme-level acoustic disparities and the acoustic-to-articulatory mapping to estimate articulatory parameters (e.g., tongue position) from speech in an unsupervised manner. To enhance interpretability and robustness, we incorporate dictionary-based transcription constraints as supervision proxies. Experiments on American and British English corpora reveal that accent strength correlates significantly with systematic tongue-position shifts for specific phonemes—including retroflex /r/ and low back vowels /ɑː/, /ɔː/—and that inter-dialectal articulatory patterns differ systematically. To our knowledge, this is the first work to establish an interpretable, quantifiable link between accent strength and fine-grained, physiologically grounded articulatory parameters. The findings advance accent modeling, spoken-language assessment, and personalized speech technologies.
📝 Abstract
This paper explores the relationship between accent strength and articulatory features inferred from acoustic speech. To quantify accent strength, we compare phonetic transcriptions with transcriptions based on dictionary-based references, computing phoneme-level difference as a measure of accent strength. The proposed framework leverages recent self-supervised learning articulatory inversion techniques to estimate articulatory features. Analyzing a corpus of read speech from American and British English speakers, this study examines correlations between derived articulatory parameters and accent strength proxies, associating systematic articulatory differences with indexed accent strength. Results indicate that tongue positioning patterns distinguish the two dialects, with notable differences inter-dialects in rhotic and low back vowels. These findings contribute to automated accent analysis and articulatory modeling for speech processing applications.