LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the storage, transmission, and deployment challenges posed by the massive parameter counts of large language models by introducing, for the first time, a systematic application of modern video compression techniques to model weight quantization. The proposed method integrates affine quantization with advanced video coding standards such as VVC/H.266, naturally aligning with the structural properties of weight matrices without requiring fine-tuning or calibration data. It demonstrates strong generalization across diverse tensor types. Experimental results on the LLaMA-3-8B model at 2-bit compression show a more than 1.5× reduction in perplexity and a 21% improvement in downstream task accuracy compared to existing approaches, substantiating the method’s efficiency, robustness, and broad applicability.

📝 Abstract

The rapid development of large language models(LLMs) has led to remarkable advances in natural language processing. However, the increasing scale of these models introduces substantial challenges in terms of storage, transmission, and deployment. Though great efforts have been devoted to model compression and quantization, existing methods often rely on fine-tuning or calibration data, which exhibit limited generalization across different tensor types. In this paper, we argue that video codecs offer a promising solution for LLM compression, due to their inherent compatibility with matrix structured data, configurable compression strategies, and the availability of highly optimized, off-the-shelf implementations. Therefore, we present LLMCodec, a video codec-based LLM compression method that integrates affine quantization with the recent VVC/H.266 video codec. Beyond VVC, we further compare a range of video codecs and encoding profiles to evaluate their impact on compression performance. Experiments on different models demonstrate the robustness and generality of LLMCodec. Notably, on LLaMA-3-8B at 2-bit precision, LLMCodec reduces perplexity by over 1.5x and improves downstream task accuracy by 21% compared with the existing method.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Model Compression

Weight Quantization

Storage Efficiency

Deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

video codec

LLM compression

affine quantization