BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Deploying high-quality text-to-speech (TTS) models on resource-constrained edge devices demands simultaneous optimization of synthesis fidelity and model footprint. This paper proposes an ultra-lightweight TTS compression framework featuring two key innovations: (1) the first 1.58-bit quantization-aware training (QAT) for edge TTS—effectively approximating ternary weights {−1, 0, 1}; and (2) Weight Indexing, a novel technique that maps multiple low-bit weight groups to a single int8 index, drastically reducing both memory storage and computational overhead. Evaluated on standard benchmarks, our method achieves 83% model size reduction while preserving speech naturalness and outperforming non-quantized baselines of comparable size. The core contributions are: (i) the first edge-deployable TTS system supporting 1.58-bit QAT; (ii) an efficient int8-indexed weight representation scheme; and (iii) real-time, high-fidelity speech synthesis on severely resource-limited hardware.

Technology Category

Application Category

📝 Abstract

This paper proposes a highly compact, lightweight text-to-speech (TTS) model for on-device applications. To reduce the model size, the proposed model introduces two techniques. First, we introduce quantization-aware training (QAT), which quantizes model parameters during training to as low as 1.58-bit. In this case, most of 32-bit model parameters are quantized to ternary values {-1, 0, 1}. Second, we propose a method named weight indexing. In this method, we save a group of 1.58-bit weights as a single int8 index. This allows for efficient storage of model parameters, even on hardware that treats values in units of 8-bit. Experimental results demonstrate that the proposed method achieved 83 % reduction in model size, while outperforming the baseline of similar model size without quantization in synthesis quality.

Problem

Research questions and friction points this paper is trying to address.

Reducing TTS model size for on-device applications

Quantizing model parameters to 1.58-bit using QAT

Efficient storage via weight indexing with int8

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 1.58-bit quantization for compactness

Employs weight indexing for efficient storage

Quantization-aware training enhances synthesis quality

🔎 Similar Papers

No similar papers found.