On the expressiveness and spectral bias of KANs

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work theoretically compares Kolmogorov–Arnold Networks (KANs) and multilayer perceptrons (MLPs) in terms of expressive power and spectral bias, specifically assessing KANs’ potential as MLP alternatives with improved efficiency in modeling high-frequency components. Method: Leveraging tools from approximation theory, spectral analysis, and spline interpolation, the study establishes rigorous theoretical characterizations of both architectures. Contribution/Results: We prove for the first time that any MLP can be exactly represented by a KAN with strictly lower parametric complexity. We reveal that KANs’ learnable spline grids inherently mitigate low-frequency bias—enhancing fidelity to high-frequency signal components. Theoretically, KANs match or exceed MLPs in expressivity; moreover, large-grid KANs achieve substantial parameter savings for specific target functions. Empirical validation confirms weaker spectral bias and significantly improved high-frequency approximation accuracy compared to MLPs.

Technology Category

Application Category

📝 Abstract
Kolmogorov-Arnold Networks (KAN) cite{liu2024kan} were very recently proposed as a potential alternative to the prevalent architectural backbone of many deep learning models, the multi-layer perceptron (MLP). KANs have seen success in various tasks of AI for science, with their empirical efficiency and accuracy demostrated in function regression, PDE solving, and many more scientific problems. In this article, we revisit the comparison of KANs and MLPs, with emphasis on a theoretical perspective. On the one hand, we compare the representation and approximation capabilities of KANs and MLPs. We establish that MLPs can be represented using KANs of a comparable size. This shows that the approximation and representation capabilities of KANs are at least as good as MLPs. Conversely, we show that KANs can be represented using MLPs, but that in this representation the number of parameters increases by a factor of the KAN grid size. This suggests that KANs with a large grid size may be more efficient than MLPs at approximating certain functions. On the other hand, from the perspective of learning and optimization, we study the spectral bias of KANs compared with MLPs. We demonstrate that KANs are less biased toward low frequencies than MLPs. We highlight that the multi-level learning feature specific to KANs, i.e. grid extension of splines, improves the learning process for high-frequency components. Detailed comparisons with different choices of depth, width, and grid sizes of KANs are made, shedding some light on how to choose the hyperparameters in practice.
Problem

Research questions and friction points this paper is trying to address.

Compare KANs and MLPs representation capabilities
Analyze KANs efficiency in function approximation
Study KANs spectral bias versus MLPs
Innovation

Methods, ideas, or system contributions that make the work stand out.

KANs as alternative to MLPs
KANs exhibit less spectral bias
KANs enable efficient high-frequency learning
🔎 Similar Papers
No similar papers found.
Y
Yixuan Wang
California Institute of Technology
Jonathan W. Siegel
Jonathan W. Siegel
Assistant Professor, Texas A&M University
Approximation TheoryStatisticsMachine Learning
Z
Ziming Liu
Massachusetts Institute of Technology, The NSF Institute for Artificial Intelligence and Fundamental Interactions
Thomas Y. Hou
Thomas Y. Hou
California Institute of Technology
numerical analysismultiscale problemsnonlinear partial differential equations