🤖 AI Summary
This work proposes an end-to-end differentiable, quantization-aware training method to address the significant accuracy degradation commonly observed in post-training quantization when deploying deep neural networks on resource-constrained devices. The approach integrates learnable quantization representatives directly into the backward pass and introduces layer-wise regularization terms that encourage weights to naturally cluster around these representatives. By treating quantization levels as trainable parameters, the method enables joint optimization of both network weights and quantization values. Evaluated on CIFAR-10 using AlexNet and VGG16, the proposed technique achieves high compression ratios with substantially reduced accuracy loss, effectively bridging the gap between model compression through quantization and preservation of model performance.
📝 Abstract
Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious challenges for deployment on resource-constrained devices. As a result, model compression has become essential, and -- among compression techniques -- weight quantization is largely used and particularly effective, yet it typically introduces a non-negligible accuracy drop. However, it is usually applied to already trained models, without influencing how the parameter space is explored during the learning phase. In contrast, we introduce per-layer regularization terms that drive weights to naturally form clusters during training, integrating quantization awareness directly into the optimization process. This reduces the accuracy loss typically associated with quantization methods while preserving their compression potential. Furthermore, in our framework quantization representatives become network parameters, marking, to the best of our knowledge, the first approach to embed quantization parameters directly into the backpropagation procedure. Experiments on CIFAR-10 with AlexNet and VGG16 models confirm the effectiveness of the proposed strategy.