🤖 AI Summary
This work addresses the lack of systematic understanding and reproducible quantification of memorization in deep neural network training. To this end, we propose a unified framework for measuring training memorization, introducing for the first time a causal estimator in function space, portable perturbation primitives, and an auditing protocol that incorporates random seeds and buffer checksums. By leveraging seed-pair analysis, manipulation of optimizer states (e.g., momentum, Adam, EMA, and batch normalization resets), data order swapping, and teacher model fine-tuning, our approach enables causal and uncertainty-aware assessment of the influence of training history. The framework facilitates reproducible investigations across models, datasets, and training strategies, significantly advancing the systematic analysis of memorization in deep learning.
📝 Abstract
Modern deep-learning training is not memoryless. Updates depend on optimizer moments and averaging, data-order policies (random reshuffling vs with-replacement, staged augmentations and replay), the nonconvex path, and auxiliary state (teacher EMA/SWA, contrastive queues, BatchNorm statistics). This survey organizes mechanisms by source, lifetime, and visibility. It introduces seed-paired, function-space causal estimands; portable perturbation primitives (carry/reset of momentum/Adam/EMA/BN, order-window swaps, queue/teacher tweaks); and a reporting checklist with audit artifacts (order hashes, buffer/BN checksums, RNG contracts). The conclusion is a protocol for portable, causal, uncertainty-aware measurement that attributes how much training history matters across models, data, and regimes.