Average gradient outer product as a mechanism for deep neural collapse

📅 2024-02-21

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 0

🤖 AI Summary

This paper addresses the data-dependent nature of Deep Neural Collapse (DNC), a phenomenon wherein the last-layer representations of deep neural networks (DNNs) become rigid—exhibiting class-wise convergence and equiangular structure. Method: We identify the Average Gradient Outer Product (AGOP) as the core mechanism driving both feature learning and collapse, establishing, for the first time, a causal link between AGOP and DNC. We propose a data-driven Deep Random Feature Model (Deep RFM), theoretically proving that its asymptotic DNC arises from kernel learning, and empirically verifying AGOP’s universality across standard DNNs. Contribution/Results: Through singular value analysis and uncentered covariance modeling, we show that AGOP governs DNC evolution layerwise; DNN weight matrices’ right singular vectors/values closely align with AGOP’s spectral structure—revealing intra-class collapse as primarily AGOP-induced. We further derive theoretical conditions for DNC emergence. Collectively, this work establishes a new paradigm for understanding DNN representation dynamics.

Technology Category

Application Category

📝 Abstract

Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. The Deep Recursive Feature Machine (Deep RFM) is a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard settings as a consequence of the projection with the AGOP matrix computed at each layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting and as a result of kernel learning. We then provide evidence that this mechanism holds for neural networks more generally. In particular, we show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.

Problem

Research questions and friction points this paper is trying to address.

Deep Neural Collapse

Rigidity in Data Processing

Lack of Data Relevance

Innovation

Methods, ideas, or system contributions that make the work stand out.

AGOP

Deep RFM

DNC

🔎 Similar Papers

No similar papers found.

Authors to Follow