Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

While quantization is widely adopted to compress large language models (LLMs) for efficient inference, its impact on model robustness—particularly in code generation—has been largely overlooked. Method: This work presents the first systematic study of quantization’s dual effects on LLM robustness across four major model families (e.g., Llama, Qwen) and multiple parameter scales. We propose a novel dual-perspective evaluation framework: input-side robustness under adversarial prompt attacks and model-side robustness under weight perturbations. Empirical analysis employs multi-granularity quantization techniques (e.g., INT4, FP16, mixed-precision). Contribution/Results: Quantized models outperform their full-precision counterparts in robustness across 51.59% of adversarial scenarios—challenging the prevailing assumption that quantization inherently degrades reliability. Our findings demonstrate that quantization can simultaneously enhance both inference efficiency and robustness, providing theoretical foundations and practical guidelines for developing trustworthy, lightweight code-generation models.

Technology Category

Application Category

📝 Abstract

Quantization has emerged as a mainstream method for compressing Large Language Models (LLMs), reducing memory requirements and accelerating inference without architectural modifications. While existing research primarily focuses on evaluating the effectiveness of quantized LLMs compared to their original counterparts, the impact on robustness remains largely unexplored.In this paper, we present the first systematic investigation of how quantization affects the robustness of LLMs in code generation tasks. Through extensive experiments across four prominent LLM families (LLaMA, DeepSeek, CodeGen, and StarCoder) with parameter scales ranging from 350M to 33B, we evaluate robustness from dual perspectives: adversarial attacks on input prompts and noise perturbations on model architecture. Our findings challenge conventional wisdom by demonstrating that quantized LLMs often exhibit superior robustness compared to their full-precision counterparts, with 51.59% versus 42.86% of our adversarial experiments showing better resilience in quantized LLMs. Similarly, our noise perturbation experiments also confirm that LLMs after quantitation generally withstand higher levels of weight disturbances. These results suggest that quantization not only reduces computational requirements but can actually enhance LLMs' reliability in code generation tasks, providing valuable insights for developing more robust and efficient LLM deployment strategies.

Problem

Research questions and friction points this paper is trying to address.

Assessing robustness of quantized LLMs in code generation

Comparing quantized vs full-precision LLM resilience to attacks

Evaluating noise perturbation effects on quantized model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic robustness evaluation of quantized LLMs

Adversarial and noise perturbation dual-perspective testing

Quantized LLMs show superior robustness in code generation

🔎 Similar Papers

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant