Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Gradient descent in least-squares learning of deep neural networks—including physics-informed neural networks (PINNs)—suffers from slow convergence and low accuracy due to the highly non-convex loss landscape and vanishing gradients. Method: We propose a hierarchical auxiliary-variable-based adaptive weighting optimization framework. By introducing layer-wise decoupled auxiliary variables and reformulating the loss function—guided by Lagrangian relaxation principles and an adaptive weight mechanism—we rigorously preserve equivalence to the original mean squared error under theoretical guarantees. Contribution/Results: This is the first method to enable adaptive adjustment of auxiliary variable weights, effectively mitigating inter-layer coupling and gradient degradation. Numerical experiments across diverse network architectures demonstrate that our approach accelerates convergence by 2–5× over standard gradient descent, while achieving higher final accuracy and improved robustness.

Technology Category

Application Category

📝 Abstract
In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning because of the high non-convexity of loss functions and the vanishing gradient issue. Our idea is to introduce auxiliary variables to separate the layers of the deep neural networks and reformulate the loss functions for ease of optimization. We design the self-adaptive weights to preserve the consistency between the reformulated loss and the original mean squared loss, which guarantees that optimizing the new loss helps optimize the original problem. Numerical experiments are presented to verify the consistency and show the effectiveness and robustness of our models over gradient descent.
Problem

Research questions and friction points this paper is trying to address.

Addresses inefficiency in deep learning gradient descent
Introduces auxiliary variables to separate neural network layers
Ensures consistency between reformulated and original loss functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces auxiliary variables for layer separation
Uses self-adaptive weights for loss consistency
Reformulates loss functions for easier optimization
🔎 Similar Papers
No similar papers found.
Yaru Liu
Yaru Liu
University of Electronic Science and Technology of China
Yiqi Gu
Yiqi Gu
University of Electronic Science and Technology of China
Applied Mathematics
M
Michael K. Ng
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China.