Overcoming Rank Collapse in Feedback Alignment

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability limitations of Feedback Alignment (FA) in deep networks, which stem from the excessively low effective rank of error signals. The study identifies low-dimensional gradient dynamics as a critical bottleneck underlying FA’s failure to scale effectively. To enhance the representational capacity of error signals, the authors propose increasing the geometric dimensionality of both weight updates and activations: they employ the Muon optimizer to orthogonalize weight updates and introduce activation normalization in hidden layers to promote activation orthogonality. Evaluated on CIFAR-100, this approach improves the test accuracy of ResNet-18 by 9 percentage points over standard FA, substantially advancing the training performance of deep networks under the FA framework.
📝 Abstract
Backpropagation (BP) is widely viewed as biologically implausible, in part because it requires feedback weights to be the transpose of forward weights for error propagation. Interestingly, when training a network with fixed random feedback weights to circumvent this issue, learning aligns the forward weights with the feedback weights, leading the backpropagated error signal to become an approximation of the standard gradient used by BP. This process, called Feedback Alignment (FA), occurs in MLPs and very shallow CNNs but does not scale well to deeper architectures. In this work, we first investigated differences between BP and FA models, trained on CIFAR10, specifically focusing on the effective rank of the signal. We found that the FA error has a considerably lower rank and hence is constrained to a lower-dimensional subspace compared to BP, limiting exploration of the parameter space. Motivated by this observation, we evaluated two mechanisms for increasing the effective dimensionality of FA: Muon, an optimiser that orthogonalises weight updates; and hidden activity normalisation, which promotes activation orthogonality. Across larger architectures and benchmarks, we find that these methods consistently improve over FA baselines, for example, on CIFAR100 with a Resnet-18, accuracy increases by 9 percentage points. Our results identify low-dimensional gradient dynamics as a key obstacle to scaling FA and suggest that inducing higher-dimensional update geometry is a promising route toward scaling alternatives to backpropagation.
Problem

Research questions and friction points this paper is trying to address.

Feedback Alignment
rank collapse
error signal
effective rank
deep neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feedback Alignment
rank collapse
effective rank
orthogonal updates
activation normalization
🔎 Similar Papers
No similar papers found.