From Dead Neurons to Deep Approximators: Deep Bernstein Networks as a Provable Alternative to Residual Layers

๐Ÿ“… 2026-02-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the vanishing gradient and "dead neuron" issues commonly observed in deep neural networks with piecewise linear activation functions, as well as the architectural constraints imposed by residual connections. To overcome these limitations, the authors propose Deep Bernstein Networks, which introduce Bernstein polynomials as activation functions into deep learning for the first time. The proposed method ensures stable gradient flow without requiring residual connections, and theoretical analysis establishes a strictly positive lower bound on the local derivatives of the activation function. Moreover, it achieves exponential decay in function approximation error, surpassing the polynomial approximation barrier inherent to ReLU-based networks. Experimental results on the HIGGS and MNIST datasets demonstrate a dramatic reduction in dead neuronsโ€”from over 90% to below 5%โ€”and consistently superior performance compared to ReLU, Leaky ReLU, SeLU, and GeLU.

Technology Category

Application Category

๐Ÿ“ Abstract
Residual connections are the de facto standard for mitigating vanishing gradients, yet they impose structural constraints and fail to address the inherent inefficiencies of piecewise linear activations. We show that Deep Bernstein Networks (which utilizes Bernstein polynomials as activation functions) can act as residual-free architecture while simultaneously optimize trainability and representation power. We provide a two-fold theoretical foundation for our approach. First, we derive a theoretical lower bound on the local derivative, proving it remains strictly bounded away from zero. This directly addresses the root cause of gradient stagnation; empirically, our architecture reduces ``dead''neurons from 90\% in standard deep networks to less than 5\%, outperforming ReLU, Leaky ReLU, SeLU, and GeLU. Second, we establish that the approximation error for Bernstein-based networks decays exponentially with depth, a significant improvement over the polynomial rates of ReLU-based architectures. By unifying these results, we demonstrate that Bernstein activations provide a superior mechanism for function approximation and signal flow. Our experiments on HIGGS and MNIST confirm that Deep Bernstein Networks achieve high-performance training without skip-connections, offering a principled path toward deep, residual-free architectures with enhanced expressive capacity.
Problem

Research questions and friction points this paper is trying to address.

vanishing gradients
dead neurons
residual connections
piecewise linear activations
function approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Bernstein Networks
Bernstein polynomials
gradient stagnation
exponential approximation error
residual-free architecture
๐Ÿ”Ž Similar Papers
No similar papers found.
I
Ibrahim Albool
Department of Electrical Engineering and Computer Science, University of California, Irvine, USA
M
Malak Gamal El-Din
Department of Electrical Engineering and Computer Science, University of California, Irvine, USA
Salma Elmalaki
Salma Elmalaki
EECS Department at University of California, Irvine
Human FactorsCPSMobile ComputingExtended Reality
Y
Yasser Shoukry
Department of Electrical Engineering and Computer Science, University of California, Irvine, USA