BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

๐Ÿ“… 2025-06-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Deploying high-quality text-to-speech (TTS) models on resource-constrained edge devices demands simultaneous optimization of synthesis fidelity and model footprint. This paper proposes an ultra-lightweight TTS compression framework featuring two key innovations: (1) the first 1.58-bit quantization-aware training (QAT) for edge TTSโ€”effectively approximating ternary weights {โˆ’1, 0, 1}; and (2) Weight Indexing, a novel technique that maps multiple low-bit weight groups to a single int8 index, drastically reducing both memory storage and computational overhead. Evaluated on standard benchmarks, our method achieves 83% model size reduction while preserving speech naturalness and outperforming non-quantized baselines of comparable size. The core contributions are: (i) the first edge-deployable TTS system supporting 1.58-bit QAT; (ii) an efficient int8-indexed weight representation scheme; and (iii) real-time, high-fidelity speech synthesis on severely resource-limited hardware.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper proposes a highly compact, lightweight text-to-speech (TTS) model for on-device applications. To reduce the model size, the proposed model introduces two techniques. First, we introduce quantization-aware training (QAT), which quantizes model parameters during training to as low as 1.58-bit. In this case, most of 32-bit model parameters are quantized to ternary values {-1, 0, 1}. Second, we propose a method named weight indexing. In this method, we save a group of 1.58-bit weights as a single int8 index. This allows for efficient storage of model parameters, even on hardware that treats values in units of 8-bit. Experimental results demonstrate that the proposed method achieved 83 % reduction in model size, while outperforming the baseline of similar model size without quantization in synthesis quality.
Problem

Research questions and friction points this paper is trying to address.

Reducing TTS model size for on-device applications
Quantizing model parameters to 1.58-bit using QAT
Efficient storage via weight indexing with int8
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 1.58-bit quantization for compactness
Employs weight indexing for efficient storage
Quantization-aware training enhances synthesis quality
๐Ÿ”Ž Similar Papers
No similar papers found.