🤖 AI Summary
This work addresses the long-standing mystery of why post-training quantization (PTQ) often fails at ultra-low bitwidths while quantization-aware training (QAT) successfully recovers accuracy. The authors propose a unified geometric framework that models the full-precision training trajectory as a low-loss “river” surrounded by flat “basins.” They reveal that PTQ fails when quantized weights cross basin boundaries into high-loss regions, whereas QAT—by computing gradients at quantized points—introduces an inward bias that steers optimization back into the low-loss basin. For the first time, this mechanism is explained from a geometric perspective, with finite-time convergence proven under a local compatibility assumption. Extensive experiments across diverse vision and language models, quantization schemes, Hessian reconstructions, and quantization grid analyses consistently validate the theory, demonstrating strong alignment between theoretical predictions and empirical results.
📝 Abstract
Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the lost accuracy. We propose a unified geometric framework that explains both PTQ failure and QAT recovery. We model full-precision training as following a low-loss \emph{river} inside a wider \emph{valley}: a normal neighborhood of the river forms a nearly flat \emph{basin}, while leaving this basin incurs a sharp loss increase. When the quantization grid is comparable to the basin width, local PTQ objectives, including rounding and Hessian-based second-order reconstruction, can select a high-loss deployed quantized point outside the basin even when nearby low-loss quantized points exist. In this regime, straight-through-estimator-based QAT has a useful bias: it evaluates gradients at the deployed quantized weights while updating latent full-precision weights, causing the gradient to sense the valley wall and acquire an inward component that steers subsequent quantized iterates back into the basin. We formalize this mechanism through a local landscape model, construct a geometric PTQ failure mode, and prove finite-time QAT recovery under local quantizer-compatibility assumptions. Experiments across vision and language models under multiple neural-network quantization schemes corroborate the predicted basin-crossing failure of PTQ and the corresponding recovery mechanism of QAT.