🤖 AI Summary
This study addresses the curse of dimensionality in high-dimensional function approximation by analyzing the approximation capabilities of deep ReLU networks in $L^p$ norms for functions exhibiting anisotropic and mixed smoothness. By introducing the notion of average smoothness, the classical isotropic theory is extended to anisotropic and mixed smoothness Besov spaces. Leveraging tools from Besov space theory, Sobolev embeddings, and constructive network analysis, the authors establish approximation rates governed by this average smoothness. The main contributions include achieving approximation bounds of order $O((WL)^{-2\tilde{s}})$ for anisotropic Besov classes and $O((WL)^{-2s})$ (up to logarithmic factors) for mixed smoothness classes, where $W$ and $L$ denote the network width and depth, respectively. These rates are shown to be minimax optimal for the respective function classes.
📝 Abstract
This paper studies how efficiently deep ReLU neural networks can approximate and learn smooth functions. When the error is measured in $L^p([0,1]^d)$ norm and the approximator is a network with width $W$ and depth $L$, recent works have proven the supper approximation rate $\mathcal{O}((WL)^{-2s/d})$ for Besov space $\mathcal{B}^s_{q,r}([0,1]^d)$ under the Sobolev embedding condition $s/d>1/q-1/p$. In order to overcome the curse of dimensionality in this rate, we extent this result to anisotropic and mixed smooth function classes. We establish the approximation rate $\mathcal{O}((WL)^{-2\tilde{s}})$ for anisotropic Besov space $\mathcal{B}^{\boldsymbol{s}}_{q,r}([0,1]^d)$ with anisotropic smoothness $\boldsymbol{s}=(s_1,\dots,s_d)$ under the embedding condition $\tilde{s} > 1/q-1/p$, where the mean smoothness $\tilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$. For mixed smooth Besov space $\mathcal{MB}^s_{q,r}([0,1]^d)$ with mixed smoothness $s>1/q-1/p$, we show that the approximation rate $\mathcal{O}((WL)^{-2s})$ holds up to logarithmic factors. Using these results, we also derive approximation bounds for the composition of anisotropic Besov functions. As an application, it is shown that deep ReLU neural networks can achieve minimax optimal rates up to logarithmic factors for a wide range of smooth function classes.