CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses dual biases in text-to-speech (TTS) systems: accent bias—overreliance on dominant pronunciation patterns—and linguistic bias—neglect of dialectal lexicon and cultural cues. We propose a dual-signal optimization framework that decouples accent fidelity modeling from dialectal text localization. Our approach integrates contextual language adaptation, retrieval-augmented accent prompting (RAAP), and instruction-guided generation to achieve fair, culturally grounded multi-accent speech synthesis. The method is architecture-agnostic, requiring no modification to core TTS models. Evaluations across 12 English accents demonstrate significant improvements in accent identification accuracy and generation fairness (+18.7% average fairness score), while preserving high naturalness (MOS ≥ 4.1). This work establishes a scalable, inclusive paradigm for equitable TTS synthesis.

Technology Category

Application Category

📝 Abstract

Instruction-guided text-to-speech (TTS) research has reached a maturity level where excellent speech generation quality is possible on demand, yet two coupled biases persist: accent bias, where models default to dominant phonetic patterns, and linguistic bias, where dialect-specific lexical and cultural cues are ignored. These biases are interdependent, as authentic accent generation requires both accent fidelity and localized text. We present Contextual Linguistic Adaptation and Retrieval for Inclusive TTS sYnthesis (CLARITY), a backbone-agnostic framework that addresses these biases through dual-signal optimization: (i) contextual linguistic adaptation that localizes input text to the target dialect, and (ii) retrieval-augmented accent prompting (RAAP) that supplies accent-consistent speech prompts. Across twelve English accents, CLARITY improves accent accuracy and fairness while maintaining strong perceptual quality.

Problem

Research questions and friction points this paper is trying to address.

Mitigates accent bias in text-to-speech generation systems

Addresses linguistic bias by localizing dialect-specific lexical cues

Resolves interdependent biases between accent fidelity and cultural context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual linguistic adaptation localizes input text to target dialect

Retrieval-augmented accent prompting supplies accent-consistent speech prompts

Dual-signal optimization addresses accent and linguistic biases simultaneously

🔎 Similar Papers

No similar papers found.

Authors to Follow