🤖 AI Summary
Deep neural networks with ReLU activation suffer from the curse of dimensionality in uniform convergence, limiting their suitability for high-reliability tasks. This work proposes deep neural networks employing smooth activation functions and develops a unified theoretical framework that, for the first time, establishes bounds on pseudo-dimension, non-asymptotic approximation guarantees, and Hölder norm estimates. The analysis reveals the model’s ability to adaptively circumvent the curse of dimensionality by exploiting the low-dimensional hierarchical structure inherent in target functions. Applied to Huber regression, least squares, quantile regression, and logistic regression, the proposed method achieves non-asymptotic uniform convergence rates. Both theoretical results and empirical experiments demonstrate that the approach significantly enhances worst-case performance while preserving statistical optimality.
📝 Abstract
This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the need for reliable uniform guarantees in downstream tasks requiring worst-case reliability, we address this limitation by analyzing smoothly activated DNNs (smooth DNNs), encompassing both feedforward and residual structures. We establish novel pseudo-dimension bounds, non-asymptotic approximation guarantees, and Hölder-norm bounds for the approximators of these models. Leveraging these results, we derive non-asymptotic uniform convergence rates for smooth DNN estimators across multiple statistical contexts, including Huber, least-squares, quantile, and logistic regression. We prove that smooth DNNs can mitigate the {curse of dimensionality} in uniform convergence by adaptively exploiting the low-dimensional hierarchical composition structure of the target function. Supported by both simulation studies and a real-world application, our results position smooth DNNs as a theoretically grounded and practically viable alternative to ReLU networks for statistical learning tasks requiring uniform guarantees.