🤖 AI Summary
This work theoretically compares Kolmogorov–Arnold Networks (KANs) and multilayer perceptrons (MLPs) in terms of expressive power and spectral bias, specifically assessing KANs’ potential as MLP alternatives with improved efficiency in modeling high-frequency components.
Method: Leveraging tools from approximation theory, spectral analysis, and spline interpolation, the study establishes rigorous theoretical characterizations of both architectures.
Contribution/Results: We prove for the first time that any MLP can be exactly represented by a KAN with strictly lower parametric complexity. We reveal that KANs’ learnable spline grids inherently mitigate low-frequency bias—enhancing fidelity to high-frequency signal components. Theoretically, KANs match or exceed MLPs in expressivity; moreover, large-grid KANs achieve substantial parameter savings for specific target functions. Empirical validation confirms weaker spectral bias and significantly improved high-frequency approximation accuracy compared to MLPs.
📝 Abstract
Kolmogorov-Arnold Networks (KAN) cite{liu2024kan} were very recently proposed as a potential alternative to the prevalent architectural backbone of many deep learning models, the multi-layer perceptron (MLP). KANs have seen success in various tasks of AI for science, with their empirical efficiency and accuracy demostrated in function regression, PDE solving, and many more scientific problems. In this article, we revisit the comparison of KANs and MLPs, with emphasis on a theoretical perspective. On the one hand, we compare the representation and approximation capabilities of KANs and MLPs. We establish that MLPs can be represented using KANs of a comparable size. This shows that the approximation and representation capabilities of KANs are at least as good as MLPs. Conversely, we show that KANs can be represented using MLPs, but that in this representation the number of parameters increases by a factor of the KAN grid size. This suggests that KANs with a large grid size may be more efficient than MLPs at approximating certain functions. On the other hand, from the perspective of learning and optimization, we study the spectral bias of KANs compared with MLPs. We demonstrate that KANs are less biased toward low frequencies than MLPs. We highlight that the multi-level learning feature specific to KANs, i.e. grid extension of splines, improves the learning process for high-frequency components. Detailed comparisons with different choices of depth, width, and grid sizes of KANs are made, shedding some light on how to choose the hyperparameters in practice.