EinSort: Sorting is All We Need for Tensorizing LLM

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of efficiently compressing large language model weights and KV caches, which lack explicit structure and thus resist conventional compression techniques. The authors propose an adaptive tensorization method based on index reordering that reveals the intrinsic low-rank structure of these tensors without modifying the underlying model architecture. By strategically permuting tensor dimensions and integrating tensor networks, low-rank approximations, and Einsum operations, the approach achieves highly efficient compression. Experimental results demonstrate that the method significantly improves reconstruction accuracy over existing baselines while simultaneously reducing both memory footprint and computational overhead in both weight and KV cache compression tasks.

📝 Abstract

Tensor networks provide efficient representations for compressing large neural networks. By carefully designing shapes and topologies, they can significantly reduce memory and computational costs. However, identifying implicit low-rank structures in large foundation models remains challenging due to their enormous scale and un-structured weight distributions. We propose an adaptive tensorization method that discovers inherent low-rank structure in a target tensor by index ordering. Experiments on weight and KV-cache compression demonstrate improved reconstruction quality compared to baselines.

Problem

Research questions and friction points this paper is trying to address.

tensor networks

low-rank structure

large language models

weight compression

KV-cache compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

tensorization

low-rank structure

index ordering