Better Neural Network Expressivity: Subdividing the Simplex

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This paper addresses the minimal depth required for ReLU neural networks to exactly represent all continuous piecewise-linear (CPWL) functions on ℝⁿ, refuting Hertrich et al.’s (NeurIPS’21) conjectured lower bound of ⌈log₂(n+1)⌉ hidden layers. Method: Leveraging a constructive approach and a novel simplicial geometric partitioning framework, we explicitly realize the 5-input maximum function with only two hidden layers and rigorously establish that ⌈log₃(n−1)⌉+1 hidden layers suffice for arbitrary n-dimensional CPWL functions. Contribution/Results: Our depth upper bound significantly improves upon all prior results and tightly approaches the information-theoretic lower bound ⌈log₃(n)⌉ recently established at ICLR’25. This constitutes a substantial advance in depth efficiency and yields a sharper theoretical characterization of the depth–width trade-off governing the expressive power of ReLU networks.

Technology Category

Application Category

📝 Abstract

This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $lceil log_2(n+1) ceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21) conjectured that this result is optimal in the sense that there are CPWL functions on $mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $lceillog_3(n-1) ceil+1$ hidden layers are sufficient to compute all CPWL functions on $mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $lceillog_3(n-2) ceil+1$ hidden layers are sufficient to compute the maximum of $ngeq 4$ numbers. Our constructions almost match the $lceillog_3(n) ceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.

Problem

Research questions and friction points this paper is trying to address.

Improving ReLU network depth for CPWL functions

Disproving optimal depth conjecture for maximum function

Geometric interpretation via polyhedral simplex subdivisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLU networks with fewer layers compute CPWL functions

Two hidden layers represent maximum of five inputs

Geometric polyhedral subdivisions simplify computations

🔎 Similar Papers

No similar papers found.