Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks

๐Ÿ“… 2024-08-01
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In two-layer ReLUยณ physics-informed neural networks (PINNs), gradient descent (GD) suffers from slow convergence due to its dependence on the smallest eigenvalue of the Gram matrix and requires learning rates that decay with sample size. Method: This paper introduces natural gradient descent (NGD) into the โ„“ยฒ-regression optimization framework for overparameterized PINNsโ€”the first such application to PINNs. Contribution/Results: We theoretically establish that, under overparameterization, NGD achieves linear convergence independent of the spectral structure of the Gram matrix, with a constant-order learning rate O(1). This breaks the strong reliance of GD/stochastic GD on the spectral properties of the neural tangent kernel (NTK). Our result yields the first optimization scheme for PINNs with rigorous convergence guarantees that are insensitive to problem condition number, significantly enhancing both training speed and stability.

Technology Category

Application Category

๐Ÿ“ Abstract
First-order methods, such as gradient descent (GD) and stochastic gradient descent (SGD), have been proven effective in training neural networks. In the context of over-parameterization, there is a line of work demonstrating that randomly initialized (stochastic) gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. However, the learning rate of GD for training two-layer neural networks exhibits poor dependence on the sample size and the Gram matrix, leading to a slow training process. In this paper, we show that for the $L^2$ regression problems, the learning rate can be improved from $mathcal{O}(lambda_0/n^2)$ to $mathcal{O}(1/|m{H}^{infty}|_2)$, which implies that GD actually enjoys a faster convergence rate. Furthermore, we generalize the method to GD in training two-layer Physics-Informed Neural Networks (PINNs), showing a similar improvement for the learning rate. Although the improved learning rate has a mild dependence on the Gram matrix, we still need to set it small enough in practice due to the unknown eigenvalues of the Gram matrix. More importantly, the convergence rate is tied to the least eigenvalue of the Gram matrix, which can lead to slow convergence. In this work, we provide the convergence analysis of natural gradient descent (NGD) in training two-layer PINNs, demonstrating that the learning rate can be $mathcal{O}(1)$, and at this rate, the convergence rate is independent of the Gram matrix.
Problem

Research questions and friction points this paper is trying to address.

Improving learning rate dependence on sample size and Gram matrix
Analyzing convergence of natural gradient descent for PINNs
Establishing quadratic convergence rate for smooth activations in NGD
Innovation

Methods, ideas, or system contributions that make the work stand out.

Natural Gradient Descent for faster convergence
Improved learning rate independent of Gram matrix
Quadratic convergence rate for smooth activations
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xianliang Xu
Tsinghua University, Beijing, China.
T
Ting Du
Tsinghua University, Beijing, China.
W
Wang Kong
Nanjing University of Aeronautics and Astronautics, Nan jing, China.
Y
Ye Li
Nanjing University of Aeronautics and Astronautics, Nan jing, China.
Zhongyi Huang
Zhongyi Huang
Professor of mathematics, Tsinghua University
Scientific Computingmultiscale methodssingular perturbation problemshigh frequency waves