ARTI-6: Towards Six-dimensional Articulatory Speech Encoding

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the high dimensionality and poor interpretability of vocal tract modeling in articulatory inversion and speech synthesis, this paper proposes the first six-dimensional articulatory speech coding framework grounded in real-time MRI data, precisely characterizing dynamic movements of key vocal tract regions—including the soft palate, tongue root, and larynx. Methodologically, we design a physiologically interpretable and computationally efficient low-dimensional encoding scheme; introduce speech foundation models to articulatory inversion for the first time; and develop an end-to-end acoustic–articulatory bidirectional mapping model. Experiments demonstrate a high articulatory inversion correlation of 0.87 and show that intelligible, high-fidelity speech can be reconstructed using only six latent dimensions—validating the sufficiency of ultra-low-dimensional representation. The code and speech samples are publicly released, establishing a new paradigm for physiological speech modeling, cross-modal speech generation, and clinical speech technologies.

Technology Category

Application Category

📝 Abstract

We propose ARTI-6, a compact six-dimensional articulatory speech encoding framework derived from real-time MRI data that captures crucial vocal tract regions including the velum, tongue root, and larynx. ARTI-6 consists of three components: (1) a six-dimensional articulatory feature set representing key regions of the vocal tract; (2) an articulatory inversion model, which predicts articulatory features from speech acoustics leveraging speech foundation models, achieving a prediction correlation of 0.87; and (3) an articulatory synthesis model, which reconstructs intelligible speech directly from articulatory features, showing that even a low-dimensional representation can generate natural-sounding speech. Together, ARTI-6 provides an interpretable, computationally efficient, and physiologically grounded framework for advancing articulatory inversion, synthesis, and broader speech technology applications. The source code and speech samples are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Develops a compact six-dimensional articulatory speech encoding framework

Predicts articulatory features from speech acoustics using foundation models

Reconstructs intelligible speech directly from low-dimensional articulatory features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Six-dimensional articulatory speech encoding from MRI data

Articulatory inversion predicts features from speech acoustics

Articulatory synthesis reconstructs speech from articulatory features

🔎 Similar Papers

No similar papers found.

Authors to Follow