When Is 0.1% Enough? Analyzing the Combined Effects of Dimensionality Reduction and Quantization on Text Embedding Compression

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the substantial storage and computational overhead of high-dimensional text embedding models by systematically investigating a joint compression strategy that combines dimensionality reduction and quantization. It is the first to demonstrate that the synergy between these two techniques can significantly outperform either approach applied in isolation. The effectiveness of the proposed method is validated across four MTEB task families and four widely used pretrained embedding models. Experimental results show that, in certain scenarios, the joint approach can reduce embedding size to as little as 0.1% of the original with negligible performance degradation, while the optimal compression strategy varies across tasks. This method thus offers a flexible and efficient solution for practical deployment, substantially lowering resource consumption without compromising embedding quality.

📝 Abstract

Recent high-performing text embedding models often output high-dimensional real-valued vectors, resulting in substantial storage and computational costs. To address this issue, compression methods based on dimensionality reduction or quantization have been proposed; however, the effects of combining dimensionality reduction and quantization have not been sufficiently investigated. In this paper, we systematically examine the effectiveness of compressing text embeddings by combining dimensionality reduction and quantization, using four MTEB task families and four pretrained embedding models. The experimental results demonstrate that combining dimensionality reduction and quantization enables substantially stronger compression than using either method alone, that in some settings embeddings can be reduced to as little as 0.1% of their original size with almost no performance degradation, and that the optimal compression strategy depends on the task.

Problem

Research questions and friction points this paper is trying to address.

text embedding compression

dimensionality reduction

quantization

storage cost

computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

dimensionality reduction

quantization

text embedding compression