🤖 AI Summary
This work systematically investigates the mathematical expressivity of neural networks, with a focus on their approximation efficiency across various function spaces. By integrating tools from functional analysis, approximation theory, and Sobolev space theory, it traces the theoretical development from the universal approximation property of single-hidden-layer networks to modern insights into depth–width trade-offs, parameter efficiency, and the influence of target function smoothness on approximation rates. The study particularly highlights the advantage of deep architectures in achieving superior parameter efficiency for structured function classes. It further incorporates recent models such as Kolmogorov–Arnold Networks (KANs) into this analytical framework, establishing a unified qualitative and quantitative understanding of neural network approximation capabilities and elucidating the pivotal role of depth in enhancing approximation efficiency.
📝 Abstract
Universal approximation theorems provide a mathematical explanation for the expressive power of neural networks. They assert that, under mild conditions on the activation function, feedforward neural networks are dense in broad function classes, such as continuous functions on compact subsets of $\mathbb{R}^d$, $L^p$ spaces, or Sobolev spaces. Over the past four decades, these qualitative universality results have evolved into a rich quantitative theory addressing approximation rates, parameter efficiency, and the role of architectural features such as depth and width. This survey presents several glimpses into this theory. We review classical density results for single-hidden-layer networks, as well as quantitative bounds that relate approximation error to network size and smoothness assumptions on target functions. Particular emphasis is placed on depth--width trade-offs and on results demonstrating that deeper architectures can achieve superior parameter efficiency for structured function classes. In addition to standard feedforward neural networks, we also review recent developments on Kolmogorov--Arnold Networks (KANs), which offer an alternative architectural paradigm and whose approximation-theoretic properties have begun to attract significant theoretical attention.