Mitigating the Curse of Dimensionality in Uniform Convergence of Deep Neural Networks via Smooth Activations

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Deep neural networks with ReLU activation suffer from the curse of dimensionality in uniform convergence, limiting their suitability for high-reliability tasks. This work proposes deep neural networks employing smooth activation functions and develops a unified theoretical framework that, for the first time, establishes bounds on pseudo-dimension, non-asymptotic approximation guarantees, and Hölder norm estimates. The analysis reveals the model’s ability to adaptively circumvent the curse of dimensionality by exploiting the low-dimensional hierarchical structure inherent in target functions. Applied to Huber regression, least squares, quantile regression, and logistic regression, the proposed method achieves non-asymptotic uniform convergence rates. Both theoretical results and empirical experiments demonstrate that the approach significantly enhances worst-case performance while preserving statistical optimality.

📝 Abstract

This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the need for reliable uniform guarantees in downstream tasks requiring worst-case reliability, we address this limitation by analyzing smoothly activated DNNs (smooth DNNs), encompassing both feedforward and residual structures. We establish novel pseudo-dimension bounds, non-asymptotic approximation guarantees, and Hölder-norm bounds for the approximators of these models. Leveraging these results, we derive non-asymptotic uniform convergence rates for smooth DNN estimators across multiple statistical contexts, including Huber, least-squares, quantile, and logistic regression. We prove that smooth DNNs can mitigate the {curse of dimensionality} in uniform convergence by adaptively exploiting the low-dimensional hierarchical composition structure of the target function. Supported by both simulation studies and a real-world application, our results position smooth DNNs as a theoretically grounded and practically viable alternative to ReLU networks for statistical learning tasks requiring uniform guarantees.

Problem

Research questions and friction points this paper is trying to address.

curse of dimensionality

uniform convergence

deep neural networks

smooth activations

nonparametric regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

smooth activations

uniform convergence

curse of dimensionality