Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

📅 2024-08-04

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Deep RePU networks suffer from gradient instability due to activation explosion/vanishing, hindering training. This work, for the first time, models the dynamical critical behavior of activation functions from an effective field theory perspective, revealing that RePU’s instability stems from phase-transition critical points induced by its nonlinear order. Building on this insight, we propose the modified RePU (MRePU) activation function—preserving differentiability and universal approximation capability while rigorously guaranteeing critical stability during training. We theoretically establish its convergence properties and comprehensively validate it across polynomial regression, physics-informed neural networks (PINNs), and real-world vision tasks. Empirical results demonstrate that MRePU significantly enhances training stability, accelerates convergence, and improves generalization—consistently outperforming both RePU and ReLU.

Technology Category

Application Category

📝 Abstract

The Rectified Power Unit (RePU) activation function, a differentiable generalization of the Rectified Linear Unit (ReLU), has shown promise in constructing neural networks due to its smoothness properties. However, deep RePU networks often suffer from critical issues such as vanishing or exploding values during training, rendering them unstable regardless of hyperparameter initialization. Leveraging the perspective of effective field theory, we identify the root causes of these failures and propose the Modified Rectified Power Unit (MRePU) activation function. MRePU addresses RePU's limitations while preserving its advantages, such as differentiability and universal approximation properties. Theoretical analysis demonstrates that MRePU satisfies criticality conditions necessary for stable training, placing it in a distinct universality class. Extensive experiments validate the effectiveness of MRePU, showing significant improvements in training stability and performance across various tasks, including polynomial regression, physics-informed neural networks (PINNs) and real-world vision tasks. Our findings highlight the potential of MRePU as a robust alternative for building deep neural networks.

Problem

Research questions and friction points this paper is trying to address.

RePU networks suffer from vanishing or exploding gradients during training.

The paper proposes MRePU to address stability issues while preserving differentiability.

MRePU ensures stable training and improves performance across diverse tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Modified Rectified Power Unit (MRePU) activation function

Uses effective field theory to analyze and ensure training stability

Validates MRePU across polynomial regression, PINNs, and vision tasks

🔎 Similar Papers

No similar papers found.