🤖 AI Summary
This work investigates the impact of introducing a “height” dimension on the expressive power and performance of neural networks. Method: We propose a three-dimensional (3D) architecture—wide–deep–high—that constructs a directed hierarchical structure among neurons within each layer, formally defining and quantifying the expressive gain conferred by height. Contribution/Results: Theoretically, under equal parameter count, our 3D network partitions the input space into $O(((2^H-1)W)^K)$ piecewise-linear regions—exponentially more than conventional 2D networks—and achieves polynomial approximation error $O(2^{-2WK})$, significantly improving approximation rates. Empirically, we validate superior performance across 5 synthetic, 15 tabular, and 3 image benchmark datasets for both regression and classification tasks. Our core contribution is establishing height as an independent architectural dimension and deriving its first theoretical expressivity bounds and approximation guarantees.
📝 Abstract
In this work, beyond width and depth, we augment a neural network with a new dimension called height by intra-linking neurons in the same layer to create an intra-layer hierarchy, which gives rise to the notion of height. We call a neural network characterized by width, depth, and height a 3D network. To put a 3D network in perspective, we theoretically and empirically investigate the expressivity of height. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $H imes W$ and depth $K$, extit{i.e.}, $mathcal{O}((2^H-1)W)^K)$ vs $mathcal{O}((HW)^K)$, in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width $mathcal{O}(W)$ and depth $mathcal{O}(K)$ can approximate polynomials in $[0,1]^d$ with error $mathcal{O}left(2^{-2WK}
ight)$, which improves $mathcal{O}left(W^{-K}
ight)$ and $mathcal{O}left(2^{-K}
ight)$ for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.