🤖 AI Summary
While automatic differentiation (AD) empirically outperforms finite differences (FD) in training physics-informed neural networks (PINNs) for partial differential equations (PDEs), the theoretical basis for this advantage—particularly regarding residual loss behavior and training dynamics—remains unquantified.
Method: We introduce *truncation entropy*, a novel theoretical metric that jointly characterizes residual loss and optimization speed. Leveraging stochastic feature analysis, two-layer network theory, numerical experiments, and information-theoretic entropy measures, we establish a quantifiable framework comparing AD and FD.
Contribution/Results: We rigorously prove that AD accelerates convergence and enhances stability in the training dynamics of PDE solvers. Numerical results demonstrate strong correlations between truncation entropy and both empirical loss decay and convergence rates. This work provides the first quantitative theoretical justification—and empirical validation—for the necessity of AD in neural PDE solvers.
📝 Abstract
Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD), which necessitates only the sample points themselves, unlike traditional finite difference (FD) approximations that require nearby local points to compute derivatives. In this paper, we quantitatively demonstrate the advantage of AD in training neural networks. The concept of truncated entropy is introduced to characterize the training property. Specifically, through comprehensive experimental and theoretical analyses conducted on random feature models and two-layer neural networks, we discover that the defined truncated entropy serves as a reliable metric for quantifying the residual loss of random feature models and the training speed of neural networks for both AD and FD methods. Our experimental and theoretical analyses demonstrate that, from a training perspective, AD outperforms FD in solving PDEs.