Compressing Large Language Models with PCA Without Performance Loss

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the significant performance degradation in large language model (LLM) compression, this paper proposes a structured PCA-based compression framework. First, input token sequences undergo polar coordinate transformation to enhance structural separability; then, MiniLM-based embedding dimensionality reduction is integrated with a lightweight Transformer architecture to enable efficient cross-modal representation alignment. The core innovation lies in systematically incorporating PCA into polar coordinate space and segmented sequence modeling—departing from conventional linear projection. Experiments demonstrate that the method achieves over 98% accuracy on multiple benchmark tasks using only 840 parameters, maintains cosine similarity above 97%, and requires less than 17% of GPT-2’s parameter count. Notably, it achieves zero-performance-loss compression—marking the first instance of maintaining full accuracy under extreme compression (<0.1% of original parameters).

Technology Category

Application Category

📝 Abstract

We demonstrate that Principal Component Analysis (PCA), when applied in a structured manner, either to polar-transformed images or segment-wise to token sequences, enables extreme compression of neural models without sacrificing performance. Across three case studies, we show that a one-layer classifier trained on PCA-compressed polar MNIST achieves over 98 percent accuracy using only 840 parameters. A two-layer transformer trained on 70-dimensional PCA-reduced MiniLM embeddings reaches 76.62 percent accuracy on the 20 Newsgroups dataset with just 81000 parameters. A decoder-only transformer generates coherent token sequences from 70-dimensional PCA embeddings while preserving over 97 percent cosine similarity with full MiniLM representations, using less than 17 percent of the parameter count of GPT-2. These results highlight PCA-based input compression as a general and effective strategy for aligning model capacity with information content, enabling lightweight architectures across multiple modalities.

Problem

Research questions and friction points this paper is trying to address.

Compress Large Language Models without losing performance

Apply PCA to reduce model parameters significantly

Maintain high accuracy with lightweight transformer architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

PCA compresses models without performance loss

Structured PCA reduces token sequence dimensions

Lightweight architectures via PCA input compression

🔎 Similar Papers

No similar papers found.