The Uneven Impact of Post-Training Quantization in Machine Translation

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The impact of quantization on multilingual machine translation—particularly for low-resource languages—remains poorly understood. Method: We systematically evaluate four post-training quantization methods—AWQ, BitsAndBytes, GGUF, and AutoRound—across 55 languages at 4-bit and 2-bit precision. Contribution/Results: Our study is the first to reveal that quantization error intensifies with decreasing language resource availability and varies across language families: 2-bit quantization severely degrades translation quality for low-resource languages, whereas 4-bit quantization preserves performance for high-resource languages. GGUF demonstrates superior robustness under 2-bit quantization. Furthermore, we validate that language-matched calibration effectively mitigates low-bit degradation. These findings provide empirical evidence and practical guidance for lightweight deployment of multilingual LLMs in resource-constrained multilingual settings.

Technology Category

Application Category

📝 Abstract
Quantization is essential for deploying large language models (LLMs) on resource-constrained hardware, but its implications for multilingual tasks remain underexplored. We conduct the first large-scale evaluation of post-training quantization (PTQ) on machine translation across 55 languages using five LLMs ranging from 1.7B to 70B parameters. Our analysis reveals that while 4-bit quantization often preserves translation quality for high-resource languages and large models, significant degradation occurs for low-resource and typologically diverse languages, particularly in 2-bit settings. We compare four quantization techniques (AWQ, BitsAndBytes, GGUF, and AutoRound), showing that algorithm choice and model size jointly determine robustness. GGUF variants provide the most consistent performance, even at 2-bit precision. Additionally, we quantify the interactions between quantization, decoding hyperparameters, and calibration languages, finding that language-matched calibration offers benefits primarily in low-bit scenarios. Our findings offer actionable insights for deploying multilingual LLMs for machine translation under quantization constraints, especially in low-resource settings.
Problem

Research questions and friction points this paper is trying to address.

Evaluating post-training quantization impact on multilingual machine translation
Assessing quality degradation in low-resource languages under quantization
Comparing quantization techniques and calibration strategies for optimal deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates post-training quantization across 55 languages
Compares four quantization techniques including GGUF
Uses language-matched calibration for low-bit scenarios
🔎 Similar Papers
No similar papers found.