KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the high computational overhead of Kolmogorov–Arnold Networks (KANs) during inference due to their learnable spline functions and the unclear feasibility of low-bit quantization. We present the first systematic exploration of efficient KAN implementations under 2–3 bit quantization, proposing a novel inference architecture that replaces recursive spline evaluation with low-bit B-spline coefficients and precomputed lookup tables. This approach dramatically enhances hardware efficiency while preserving model accuracy: ResKAN18 achieves a 50× reduction in BitOps, 2.9× faster GPU inference, 36% fewer FPGA resources with a 50% higher operating frequency, and a 72% smaller area on a 28nm ASIC alongside a 50% frequency improvement.

Technology Category

Application Category

📝 Abstract

Kolmogorov-Arnold Networks (KANs) have gained attention for their potential to outperform Multi-Layer Perceptrons (MLPs) in terms of parameter efficiency and interpretability. Unlike traditional MLPs, KANs use learnable non-linear activation functions, typically spline functions, expressed as linear combinations of basis splines (B-splines). B-spline coefficients serve as the model's learnable parameters. However, evaluating these spline functions increases computational complexity during inference. Conventional quantization reduces this complexity by lowering the numerical precision of parameters and activations. However, the impact of quantization on KANs, and especially its effectiveness in reducing computational complexity, is largely unexplored, particularly for quantization levels below 8 bits. The study investigates the impact of low-bit quantization on KANs and its impact on computational complexity and hardware efficiency. Results show that B-splines can be quantized to 2-3 bits with negligible loss in accuracy, significantly reducing computational complexity. Hence, we investigate the potential of using low-bit quantized precomputed tables as a replacement for the recursive B-spline algorithm. This approach aims to further reduce the computational complexity of KANs and enhance hardware efficiency while maintaining accuracy. For example, ResKAN18 achieves a 50x reduction in BitOps without loss of accuracy using low-bit-quantized B-spline tables. Additionally, precomputed 8-bit lookup tables improve GPU inference speedup by up to 2.9x, while on FPGA-based systolic-array accelerators, reducing B-spline table precision from 8 to 3 bits cuts resource usage by 36%, increases clock frequency by 50%, and enhances speedup by 1.24x. On a 28nm FD-SOI ASIC, reducing the B-spline bit-width from 16 to 3 bits achieves 72% area reduction and 50% higher maximum frequency.

Problem

Research questions and friction points this paper is trying to address.

Kolmogorov-Arnold Networks

low-bit quantization

computational complexity

hardware efficiency

B-splines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kolmogorov-Arnold Networks

low-bit quantization

B-spline lookup tables