🤖 AI Summary
Kolmogorov–Arnold Networks (KANs) exhibit strong theoretical expressivity but suffer from high computational cost and weak classification performance. To address these limitations, we propose MJKAN—a hybrid architecture that integrates KAN’s nonlinear modeling capability with the efficient inference of multilayer perceptrons (MLPs). Our core innovation is a Feature-wise Linear Modulation (FiLM)-style mechanism that jointly modulates learnable univariate functions and radial basis function (RBF) activations at the feature level, preserving KAN’s structural advantages while drastically reducing computational complexity. Experiments demonstrate that MJKAN significantly outperforms MLPs on function regression tasks, achieves comparable accuracy to MLPs on image and text classification benchmarks, and thus exhibits strong cross-task adaptability. Furthermore, we identify the number of RBF basis functions as a critical factor governing generalization performance. This work establishes a new paradigm for designing neural networks that simultaneously achieve high expressivity and computational efficiency.
📝 Abstract
Kolmogorov-Arnold Networks (KANs) have garnered attention for replacing fixed activation functions with learnable univariate functions, but they exhibit practical limitations, including high computational costs and performance deficits in general classification tasks. In this paper, we propose the Modulation Joint KAN (MJKAN), a novel neural network layer designed to overcome these challenges. MJKAN integrates a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function (RBF) activations, creating a hybrid architecture that combines the non-linear expressive power of KANs with the efficiency of Multilayer Perceptrons (MLPs). We empirically validated MJKAN's performance across a diverse set of benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS Spam). The results demonstrate that MJKAN achieves superior approximation capabilities in function regression tasks, significantly outperforming MLPs, with performance improving as the number of basis functions increases. Conversely, in image and text classification, its performance was competitive with MLPs but revealed a critical dependency on the number of basis functions. We found that a smaller basis size was crucial for better generalization, highlighting that the model's capacity must be carefully tuned to the complexity of the data to prevent overfitting. In conclusion, MJKAN offers a flexible architecture that inherits the theoretical advantages of KANs while improving computational efficiency and practical viability.