Beacon: Post-Training Quantization with Integrated Grid Selection

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the need for manual tuning or grid search in per-channel post-training quantization (PTQ) of large language models (LLMs), this paper proposes a parameter-free automated quantization method. The core innovation leverages the geometric properties of symmetric scalar quantization to analytically derive optimal channel-wise scaling factors directly over a fixed, non-scaled codebook—eliminating heuristic design, backpropagation, and reliance on large calibration datasets. The method supports both symmetric and asymmetric quantization and requires only a single forward pass. Evaluated on mainstream LLMs—including LLaMA and OPT—the approach achieves state-of-the-art performance under stringent settings (e.g., W4A4), while significantly reducing memory footprint and computational overhead. This enhances the efficiency and practicality of deploying large models on edge devices.

Technology Category

Application Category

📝 Abstract

Quantization is a widely used compression technique for reducing the memory and computation costs of large pre-trained models. A key challenge in per-channel post-training quantization (PTQ) is selecting appropriate scaling factors to replace weight values with values from a scaled quantization grid. Existing methods typically fix the scale at the outset via heuristic tuning or grid search. In this note, we propose Beacon, a simple and effective algorithm that eliminates the need for such manual tuning. Beacon performs per-channel PTQ directly using a fixed non-scaled alphabet and automatically determines the optimal scaling factors by exploiting the geometry of symmetric scalar quantization. It supports both symmetric and asymmetric quantization with minimal modifications and does not rely on back-propagation or large calibration sets. Despite its simplicity and tuning-free nature, Beacon achieves competitive performance compared to state-of-the-art methods, making it a practical solution for efficient model deployment.

Problem

Research questions and friction points this paper is trying to address.

Selects optimal scaling factors for quantization

Eliminates manual tuning in post-training quantization

Automates grid selection using fixed non-scaled alphabet

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated grid selection for quantization scaling

Exploits symmetric quantization geometry automatically

No back-propagation or large calibration sets required

🔎 Similar Papers

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers