Slowing Learning by Erasing Simple Features

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work investigates how low-order statistics—particularly quadratically accessible information—affect neural network learning dynamics, revealing their critical role in image classification. We propose QLEACE, the first closed-form quadratic concept erasure method, which precisely and efficiently removes such information under covariance constraints. Systematic evaluation against baselines—including LEACE, approximate quadratic erasure, and data augmentation—is conducted using Shannon information analysis. Results show that QLEACE consistently slows learning in feedforward networks, confirming the essential role of quadratic statistics in early-stage learning. In contrast, deeper architectures exploit spurious higher-order label correlations inadvertently introduced during erasure, demonstrating architectural differences in statistical adaptability. Notably, certain approximate erasure methods accelerate learning on specific datasets, exhibiting implicit data augmentation effects. This study establishes, for the first time, a causal link between statistical order and learning dynamics, offering a novel perspective on the mechanistic principles underlying neural network learning.

Technology Category

Application Category

📝 Abstract

Prior work suggests that neural networks tend to learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we derive a novel closed-form concept erasure method, QLEACE, which surgically removes all quadratically available information about a concept from a representation. Through comparisons with linear erasure (LEACE) and two approximate forms of quadratic erasure, we explore whether networks can still learn when low-order statistics are removed from image classification datasets. We find that while LEACE consistently slows learning, quadratic erasure can exhibit both positive and negative effects on learning speed depending on the choice of dataset, model architecture, and erasure method. Use of QLEACE consistently slows learning in feedforward architectures, but more sophisticated architectures learn to use injected higher order Shannon information about class labels. Its approximate variants avoid injecting information, but surprisingly act as data augmentation techniques on some datasets, enhancing learning speed compared to LEACE.

Problem

Research questions and friction points this paper is trying to address.

Neural networks learn low-order moments first

QLEACE removes quadratic information from representations

Impact of erasing low-order statistics on learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

QLEACE erases quadratic features

Compares linear and quadratic erasure effects

Enhances learning with data augmentation

🔎 Similar Papers

Residual Connections Harm Generative Representation Learning