🤖 AI Summary
This paper addresses the minimal depth required for ReLU neural networks to exactly represent all continuous piecewise-linear (CPWL) functions on ℝⁿ, refuting Hertrich et al.’s (NeurIPS’21) conjectured lower bound of ⌈log₂(n+1)⌉ hidden layers.
Method: Leveraging a constructive approach and a novel simplicial geometric partitioning framework, we explicitly realize the 5-input maximum function with only two hidden layers and rigorously establish that ⌈log₃(n−1)⌉+1 hidden layers suffice for arbitrary n-dimensional CPWL functions.
Contribution/Results: Our depth upper bound significantly improves upon all prior results and tightly approaches the information-theoretic lower bound ⌈log₃(n)⌉ recently established at ICLR’25. This constitutes a substantial advance in depth efficiency and yields a sharper theoretical characterization of the depth–width trade-off governing the expressive power of ReLU networks.
📝 Abstract
This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $lceil log_2(n+1)
ceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions on $mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $lceillog_3(n-1)
ceil+1$ hidden layers are sufficient to compute all CPWL functions on $mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $lceillog_3(n-2)
ceil+1$ hidden layers are sufficient to compute the maximum of $ngeq 4$ numbers. Our constructions almost match the $lceillog_3(n)
ceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.