On Expressivity of Height in Neural Networks

📅 2023-05-11
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of introducing a “height” dimension on the expressive power and performance of neural networks. Method: We propose a three-dimensional (3D) architecture—wide–deep–high—that constructs a directed hierarchical structure among neurons within each layer, formally defining and quantifying the expressive gain conferred by height. Contribution/Results: Theoretically, under equal parameter count, our 3D network partitions the input space into $O(((2^H-1)W)^K)$ piecewise-linear regions—exponentially more than conventional 2D networks—and achieves polynomial approximation error $O(2^{-2WK})$, significantly improving approximation rates. Empirically, we validate superior performance across 5 synthetic, 15 tabular, and 3 image benchmark datasets for both regression and classification tasks. Our core contribution is establishing height as an independent architectural dimension and deriving its first theoretical expressivity bounds and approximation guarantees.
📝 Abstract
In this work, beyond width and depth, we augment a neural network with a new dimension called height by intra-linking neurons in the same layer to create an intra-layer hierarchy, which gives rise to the notion of height. We call a neural network characterized by width, depth, and height a 3D network. To put a 3D network in perspective, we theoretically and empirically investigate the expressivity of height. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $H imes W$ and depth $K$, extit{i.e.}, $mathcal{O}((2^H-1)W)^K)$ vs $mathcal{O}((HW)^K)$, in terms of generating more pieces in a piecewise linear function. Next, through approximation rate analysis, we show that by introducing intra-layer links into networks, a ReLU network of width $mathcal{O}(W)$ and depth $mathcal{O}(K)$ can approximate polynomials in $[0,1]^d$ with error $mathcal{O}left(2^{-2WK} ight)$, which improves $mathcal{O}left(W^{-K} ight)$ and $mathcal{O}left(2^{-K} ight)$ for fixed width networks. Lastly, numerical experiments on 5 synthetic datasets, 15 tabular datasets, and 3 image benchmarks verify that 3D networks can deliver competitive regression and classification performance.
Problem

Research questions and friction points this paper is trying to address.

Neural Networks
Height Dimension
Approximation Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Neural Network
Height Dimension
Enhanced Expressive Power
🔎 Similar Papers
No similar papers found.
F
Fenglei Fan
Department of Mathematics, The Chinese University of Hong Kong, Hong Kong
Z
Ze-yu Li
Department of Mathematics, The Chinese University of Hong Kong, Hong Kong
Huan Xiong
Huan Xiong
Harbin Institute of Technology
CombinatoricsMachine Learning
T
T. Zeng
Department of Mathematics, The Chinese University of Hong Kong, Hong Kong