🤖 AI Summary
Traditional backpropagation (BP) suffers from high computational overhead and poor scalability to deep networks; existing forward-only methods (e.g., PEPITA) are constrained to shallow architectures and cannot support deep training. This paper proposes FOTON, the first framework to theoretically identify the root cause of forward-training failure and establish a linear orthogonality condition equivalent to BP—relaxed nonlinearly to enable depth scalability. FOTON integrates orthogonal weight constraints, forward-gradient approximation, implicit differentiation, and convolution-aware structural design. It significantly outperforms PEPITA on both fully connected and convolutional networks, successfully training models exceeding 100 layers while achieving substantial speedup in training time. The implementation is publicly available.
📝 Abstract
Backpropagation is still the de facto algorithm used today to
train neural networks.
With the exponential growth of recent architectures, the
computational cost of this algorithm also becomes a burden. The
recent PEPITA and forward-only frameworks have proposed promising
alternatives, but they failed to scale up to a handful of hidden
layers, yet limiting their use.
In this paper, we first analyze theoretically the main limitations of
these approaches. It allows us the design of a forward-only
algorithm, which is equivalent to backpropagation under the linear
and orthogonal assumptions. By relaxing the linear assumption, we
then introduce FOTON (Forward-Only Training of Orthogonal Networks)
that bridges the gap with the backpropagation
algorithm. Experimental results show that it outperforms PEPITA,
enabling us to train neural networks of any depth, without the need
for a backward pass.
Moreover its performance on convolutional networks clearly opens up avenues for its application to more
advanced architectures. The code is open-sourced at https://github.com/p0lcAi/FOTON .