🤖 AI Summary
Existing neural image/video codecs underutilize the optimization potential of vector quantization and decoder-side entropy gradients, limiting rate-distortion (R-D) performance. This work first reveals a strong correlation between entropy gradients and reconstruction error gradients, and proposes replacing the intractable true error gradient with a differentiable entropy gradient for end-to-end optimization. We theoretically and empirically demonstrate that uniform vector quantization outperforms non-uniform scalar quantization in this context. Leveraging this insight, we design a differentiable entropy modeling module integrated into standard neural codec architectures. Evaluated across multiple mainstream pretrained models, our method achieves 1–3% average bit-rate reduction at equivalent reconstruction quality, significantly improving the R-D performance of both conventional and learned codecs.
📝 Abstract
End-to-end image and video codecs are becoming increasingly competitive, compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques, such as their straightforward adaptation to perceptual distortion metrics and high performance in specific fields thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the entropy gradient in decoding devices. In this paper, we propose to leverage these two properties (vector quantization and entropy gradient) to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. We thus suggest using predefined optimal uniform vector quantization to improve performance. Secondly, we show that the entropy gradient, available at the decoder, is correlated with the reconstruction error gradient, which is not available at the decoder. We therefore use the former as a proxy to enhance compression performance. Our experimental results show that these approaches save between 1 to 3% of the rate for the same quality across various pretrained methods. In addition, the entropy gradient based solution improves traditional codec performance significantly as well.