Optimal Convergence Rates of Deep Neural Network Classifiers

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the optimal convergence rate of deep neural networks for high-dimensional binary classification under the Tsybakov noise condition and composite function structure assumptions. The problem setting considers inputs in $[0,1]^d$, where the conditional class probability is modeled as a composition of vector-valued multivariate functions—including maxima or Hölder-$eta$ smooth functions. Methodologically, leveraging the Tsybakov noise model and a generalized oracle inequality, along with optimization analysis of ReLU networks minimizing the hinge loss, we derive, for the first time, an explicit, dimension-free optimal convergence rate for the 0–1 excess risk. Key contributions are: (1) breaking the “curse of dimensionality” by establishing a tight theoretical lower bound; (2) proving that ReLU deep networks achieve this optimal rate up to logarithmic factors; and (3) demonstrating their statistical optimality on high-dimensional sparse-structured data—confirmed consistently by both theory and empirical validation.

Technology Category

Application Category

📝 Abstract
In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s in [0,infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of $q+1$ vector-valued multivariate functions, where each component function is either a maximum value function or a H""{o}lder-$eta$ smooth function that depends only on $d_*$ of its input variables. Notably, $d_*$ can be significantly smaller than the input dimension $d$. We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is $$ left( frac{1}{n} ight)^{frac{etacdot(1wedgeeta)^q}{{frac{d_*}{s+1}+(1+frac{1}{s+1})cdotetacdot(1wedgeeta)^q}}};;;, $$ which is independent of the input dimension $d$. Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The technique used to establish these results extends the oracle inequality presented in our previous work. The generalized approach is of independent interest.
Problem

Research questions and friction points this paper is trying to address.

Study binary classification under Tsybakov noise and compositional assumptions
Determine optimal convergence rates for excess 0-1 risk of classifiers
Prove ReLU DNNs achieve near-optimal rates in high-dimensional settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

ReLU DNNs achieve optimal convergence rates
Hinge loss training for high-dimensional classification
Oracle inequality extension for theoretical proof
🔎 Similar Papers
No similar papers found.
Z
Zihan Zhang
School of Mathematics and Statistics, The University of Sydney, Sydney NSW 2006, Australia
L
Lei Shi
School of Mathematical Sciences and Shanghai Key Laboratory for Contemporary Applied Mathematics, Fudan University, Shanghai 200433, China
Ding-Xuan Zhou
Ding-Xuan Zhou
University of Sydney
theory of deep learningstatistical learningwaveletsapproximation theory