Gradient-free training of recurrent neural networks

📅 2024-10-30

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address gradient explosion/vanishing, training instability, and high computational cost inherent in backpropagation through time (BPTT) for RNNs, this paper proposes a gradient-free end-to-end training paradigm. It fixes random hidden-layer weights and models temporal dynamics via Koopman operator theory, leveraging extended dynamic mode decomposition (EDMD) to analytically solve for output-layer weights. This work is the first to integrate Koopman theory with randomized feature networks, yielding theoretically interpretable and numerically stable RNN constructions. Evaluated on chaotic system forecasting, meteorological modeling, and control tasks, the method achieves significantly faster training, surpasses state-of-the-art gradient-based methods in prediction accuracy, and exhibits strong convergence robustness.

Technology Category

Application Category

📝 Abstract

Recurrent neural networks are a successful neural architecture for many time-dependent problems, including time series analysis, forecasting, and modeling of dynamical systems. Training such networks with backpropagation through time is a notoriously difficult problem because their loss gradients tend to explode or vanish. In this contribution, we introduce a computational approach to construct all weights and biases of a recurrent neural network without using gradient-based methods. The approach is based on a combination of random feature networks and Koopman operator theory for dynamical systems. The hidden parameters of a single recurrent block are sampled at random, while the outer weights are constructed using extended dynamic mode decomposition. This approach alleviates all problems with backpropagation commonly related to recurrent networks. The connection to Koopman operator theory also allows us to start using results in this area to analyze recurrent neural networks. In computational experiments on time series, forecasting for chaotic dynamical systems, and control problems, as well as on weather data, we observe that the training time and forecasting accuracy of the recurrent neural networks we construct are improved when compared to commonly used gradient-based methods.

Problem

Research questions and friction points this paper is trying to address.

Vanishing and Exploding Gradients

Recurrent Neural Networks

Training Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Koopman Operator Theory

Random Feature Networks

Extended Dynamic Mode Decomposition

🔎 Similar Papers

No similar papers found.