🤖 AI Summary
This work addresses the problem that quantization error bounds for post-training weight quantization of deep convolutional neural networks (DCNNs) deteriorate sharply with increasing network depth. To tackle this, we propose an inter-layer parametric modeling approach, establishing— for the first time—a depth-sensitive theoretical bound on quantization error. By leveraging Lipschitz continuity analysis and explicitly modeling error propagation across layers, our method substantially alleviates the bound’s dependence on network depth. Experiments on MobileNetV2 and ResNets demonstrate that the new bound is orders of magnitude tighter than state-of-the-art bounds, with the theoretical upper bound closely approximating actual quantization distortion. This advancement provides a more reliable and scalable theoretical foundation for low-bit model deployment, significantly improving the predictability and controllability of quantization accuracy—particularly in deep networks.
📝 Abstract
This paper introduces novel theoretical approximation bounds for the output of quantized neural networks, with a focus on convolutional neural networks (CNN). By considering layerwise parametrization and focusing on the quantization of weights, we provide bounds that gain several orders of magnitude compared to state-of-the art results on classical deep convolutional neural netorks such as MobileNetV2 or ResNets. These gains are achieved by improving the behaviour of the approximation bounds with respect to the depth parameter, which has the most impact on the approximation error induced by quantization. To complement our theoretical result, we provide a numerical exploration of our bounds on Mo-bileNetV2 and ResNets.