π€ AI Summary
This work addresses the storage, transmission, and deployment challenges posed by the massive parameter counts of large language models by introducing, for the first time, a systematic application of modern video compression techniques to model weight quantization. The proposed method integrates affine quantization with advanced video coding standards such as VVC/H.266, naturally aligning with the structural properties of weight matrices without requiring fine-tuning or calibration data. It demonstrates strong generalization across diverse tensor types. Experimental results on the LLaMA-3-8B model at 2-bit compression show a more than 1.5Γ reduction in perplexity and a 21% improvement in downstream task accuracy compared to existing approaches, substantiating the methodβs efficiency, robustness, and broad applicability.
π Abstract
The rapid development of large language models(LLMs) has led to remarkable advances in natural language processing. However, the increasing scale of these models introduces substantial challenges in terms of storage, transmission, and deployment. Though great efforts have been devoted to model compression and quantization, existing methods often rely on fine-tuning or calibration data, which exhibit limited generalization across different tensor types. In this paper, we argue that video codecs offer a promising solution for LLM compression, due to their inherent compatibility with matrix structured data, configurable compression strategies, and the availability of highly optimized, off-the-shelf implementations. Therefore, we present LLMCodec, a video codec-based LLM compression method that integrates affine quantization with the recent VVC/H.266 video codec. Beyond VVC, we further compare a range of video codecs and encoding profiles to evaluate their impact on compression performance. Experiments on different models demonstrate the robustness and generality of LLMCodec. Notably, on LLaMA-3-8B at 2-bit precision, LLMCodec reduces perplexity by over 1.5x and improves downstream task accuracy by 21% compared with the existing method.