Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks

📅 2024-07-03
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Standard gradient descent (GD) often suffers from slow convergence and stagnation at suboptimal solutions when training physics-informed neural networks (PINNs) for multiscale physical problems. Method: This paper investigates the convergence properties of implicit gradient descent (IGD) applied to overparameterized two-layer PINNs. Contribution/Results: We establish, for the first time, a linear convergence guarantee for IGD on PINNs—significantly relaxing width requirements compared to prior GD analyses—and prove that the learning rate can be chosen independently of both sample size and the smallest eigenvalue of the Gram matrix. Technically, our analysis integrates Gram matrix positive-definiteness with overparameterization theory, accommodating smooth activation functions including sigmoid, tanh, and softplus. Empirical results demonstrate that IGD achieves faster convergence and higher accuracy than standard GD on multiscale partial differential equation modeling tasks.

Technology Category

Application Category

📝 Abstract
Optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD) outperforms it in handling certain multi-scale problems. In this paper, we provide convergence analysis for the IGD in training over-parameterized two-layer PINNs. We first demonstrate the positive definiteness of Gram matrices for some general smooth activation functions, such as sigmoidal function, softplus function, tanh function, and others. Then, over-parameterization allows us to prove that the randomly initialized IGD converges a globally optimal solution at a linear convergence rate. Moreover, due to the distinct training dynamics of IGD compared to GD, the learning rate can be selected independently of the sample size and the least eigenvalue of the Gram matrix. Additionally, the novel approach used in our convergence analysis imposes a milder requirement on the network width. Finally, empirical results validate our theoretical findings.
Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence of implicit gradient descent for two-layer PINNs
Proving IGD achieves global optimality in over-parameterized PINNs
Demonstrating IGD's learning rate independence from sample size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Gradient Descent for PINNs training
Linear convergence with over-parameterization
Learning rate independent of sample size
🔎 Similar Papers
No similar papers found.
X
Xianliang Xu
Tsinghua University, Beijing, China.
Zhongyi Huang
Zhongyi Huang
Professor of mathematics, Tsinghua University
Scientific Computingmultiscale methodssingular perturbation problemshigh frequency waves
Y
Ye Li
Nanjing University of Aeronautics and Astronautics, Nanjing, China.