๐ค AI Summary
This work studies score function estimation for score-based generative models (SGMs) given only $n$ i.i.d. $d$-dimensional samples from an $alpha$-sub-Gaussian true distribution $P_0$. Using deep ReLU neural networks and optimizing under both score matching loss and mean squared error, we establishโ*for the first time*โnear-optimal generalization rates *without* requiring strong assumptions such as Lipschitz continuity of the score or a lower bound on the data density. Specifically, the mean squared error achieves $ ilde{O}(n^{-1})$, while the score matching loss converges to $ ilde{O}(n^{-1} t_0^{-d/2})$ at time steps $t_0 gtrsim alpha^2 n^{-2/d} log n$. Our theory justifies early stopping to attain nearly minimax-optimal rates. Moreover, by characterizing model capacity via Sobolev and Besov space regularity, we reveal an intrinsic interplay between network architecture and the smoothness of the underlying distribution.
๐ Abstract
This paper studies the approximation and generalization abilities of score-based neural network generative models (SGMs) in estimating an unknown distribution $P_0$ from $n$ i.i.d. observations in $d$ dimensions. Assuming merely that $P_0$ is $alpha$-sub-Gaussian, we prove that for any time step $t in [t_0, n^{O(1)}]$, where $t_0 geq O(alpha^2n^{-2/d}log n)$, there exists a deep ReLU neural network with width $leq O(log^3n)$ and depth $leq O(n^{3/d}log_2n)$ that can approximate the scores with $ ilde{O}(n^{-1})$ mean square error and achieve a nearly optimal rate of $ ilde{O}(n^{-1}t_0^{-d/2})$ for score estimation, as measured by the score matching loss. Our framework is universal and can be used to establish convergence rates for SGMs under milder assumptions than previous work. For example, assuming further that the target density function $p_0$ lies in Sobolev or Besov classes, with an appropriately early stopping strategy, we demonstrate that neural network-based SGMs can attain nearly minimax convergence rates up to logarithmic factors. Our analysis removes several crucial assumptions, such as Lipschitz continuity of the score function or a strictly positive lower bound on the target density.