🤖 AI Summary
Optimizing deep neural networks for solving partial differential equations (PDEs) remains challenging due to highly nonconvex loss landscapes, leading to poor convergence, local minima trapping, and gradient explosion/vanishing. To address this, we propose the Layer-Separation (LySep) model. LySep introduces auxiliary variables to decouple strong inter-layer dependencies in deep networks, reformulating the original problem into an equivalent form where coupling occurs only between adjacent layers. We then develop an optimization framework based on the Alternating Direction Method of Multipliers (ADMM), enabling closed-form updates for most variables. We theoretically prove that LySep is equivalent to the original neural network model in terms of optimal solutions. Numerical experiments demonstrate that LySep significantly reduces both training loss and PDE solution error, consistently outperforming state-of-the-art physics-informed neural networks (PINNs) and other baselines—particularly in high-dimensional settings.
📝 Abstract
In this paper, we propose a new optimization framework, the layer separation (LySep) model, to improve the deep learning-based methods in solving partial differential equations. Due to the highly non-convex nature of the loss function in deep learning, existing optimization algorithms often converge to suboptimal local minima or suffer from gradient explosion or vanishing, resulting in poor performance. To address these issues, we introduce auxiliary variables to separate the layers of deep neural networks. Specifically, the output and its derivatives of each layer are represented by auxiliary variables, effectively decomposing the deep architecture into a series of shallow architectures. New loss functions with auxiliary variables are established, in which only variables from two neighboring layers are coupled. Corresponding algorithms based on alternating directions are developed, where many variables can be updated optimally in closed forms. Moreover, we provide theoretical analyses demonstrating the consistency between the LySep model and the original deep model. High-dimensional numerical results validate our theory and demonstrate the advantages of LySep in minimizing loss and reducing solution error.