LUT-KAN: Segment-wise LUT Quantization for Fast KAN Inference

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work proposes LUT-KAN, a novel quantization approach for Kolmogorov–Arnold Networks (KANs), which addresses their high CPU inference cost and quantization difficulty stemming from extensive spline evaluations. LUT-KAN introduces the first piecewise look-up table (LUT) quantization scheme tailored for KANs, integrating int8/uint8 affine quantization with linear interpolation, along with well-defined boundary handling and out-of-range strategies. To isolate the impact of representation efficiency from backend-specific optimizations, the authors further propose an “honest baseline” evaluation methodology. Evaluated on the CICIDS2017 DoS detection task, LUT-KAN achieves 10–12× CPU inference speedup—depending on the backend—while incurring less than a 0.0002 drop in F1 score. At L=64, the memory overhead is approximately 10× that of the original model.

Technology Category

Application Category

📝 Abstract

Kolmogorov--Arnold Networks (KAN) replace scalar weights by learnable univariate functions, often implemented with B-splines. This design can be accurate and interpretable, but it makes inference expensive on CPU because each layer requires many spline evaluations. Standard quantization toolchains are also hard to apply because the main computation is not a matrix multiply but repeated spline basis evaluation. This paper introduces LUT-KAN, a segment-wise lookup-table (LUT) compilation and quantization method for PyKAN-style KAN layers. LUT-KAN converts each edge function into a per-segment LUT with affine int8/uint8 quantization and linear interpolation. The method provides an explicit and reproducible inference contract, including boundary conventions and out-of-bounds (OOB) policies. We propose an ``honest baseline''methodology for speed evaluation: B-spline evaluation and LUT evaluation are compared under the same backend optimization (NumPy vs NumPy and Numba vs Numba), which separates representation gains from vectorization and JIT effects. Experiments include controlled sweeps over LUT resolution L in 16, 32, 64, 128 and two quantization schemes (symmetric int8 and asymmetric uint8). We report accuracy, speed, and memory metrics with mean and standard deviation across multiple seeds. A two-by-two OOB robustness matrix evaluates behavior under different boundary modes and OOB policies. In a case study, we compile a trained KAN model for DoS attack detection (CICIDS2017 pipeline) into LUT artifacts. The compiled model preserves classification quality (F1 drop below 0.0002) while reducing steady-state CPU inference latency by 12x under NumPy and 10x under Numba backends (honest baseline). The memory overhead is approximately 10x at L=64. All code and artifacts are publicly available with fixed release tags for reproducibility.

Problem

Research questions and friction points this paper is trying to address.

Kolmogorov–Arnold Networks

inference latency

spline evaluation

quantization

CPU efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

LUT-KAN

Kolmogorov–Arnold Networks

lookup-table quantization