Training Memory in Deep Neural Networks: Mechanisms, Evidence, and Measurement Gaps

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the lack of systematic understanding and reproducible quantification of memorization in deep neural network training. To this end, we propose a unified framework for measuring training memorization, introducing for the first time a causal estimator in function space, portable perturbation primitives, and an auditing protocol that incorporates random seeds and buffer checksums. By leveraging seed-pair analysis, manipulation of optimizer states (e.g., momentum, Adam, EMA, and batch normalization resets), data order swapping, and teacher model fine-tuning, our approach enables causal and uncertainty-aware assessment of the influence of training history. The framework facilitates reproducible investigations across models, datasets, and training strategies, significantly advancing the systematic analysis of memorization in deep learning.

Technology Category

Application Category

📝 Abstract

Modern deep-learning training is not memoryless. Updates depend on optimizer moments and averaging, data-order policies (random reshuffling vs with-replacement, staged augmentations and replay), the nonconvex path, and auxiliary state (teacher EMA/SWA, contrastive queues, BatchNorm statistics). This survey organizes mechanisms by source, lifetime, and visibility. It introduces seed-paired, function-space causal estimands; portable perturbation primitives (carry/reset of momentum/Adam/EMA/BN, order-window swaps, queue/teacher tweaks); and a reporting checklist with audit artifacts (order hashes, buffer/BN checksums, RNG contracts). The conclusion is a protocol for portable, causal, uncertainty-aware measurement that attributes how much training history matters across models, data, and regimes.

Problem

Research questions and friction points this paper is trying to address.

training memory

deep neural networks

causal measurement

optimizer state

training history

Innovation

Methods, ideas, or system contributions that make the work stand out.

training memory

causal estimands

perturbation primitives