Rethinking the Potential of Layer Freezing for Efficient DNN Training

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Traditional layer freezing still requires forward propagation through frozen layers, limiting computational efficiency gains; while feature map caching holds promise, it faces two overlooked challenges: augmentation invalidation and substantial memory overhead. This paper proposes the first systematic framework to address these issues: (1) a similarity-aware channel-level data augmentation strategy to mitigate distribution shift in cached features, and (2) a lossy progressive compression scheme that significantly reduces storage cost without compromising accuracy. Experiments across diverse models (ResNet, ViT) and benchmarks (ImageNet, CIFAR) demonstrate that our approach reduces training FLOPs by 32–47%, decreases GPU memory consumption by 58–73%, and incurs only marginal accuracy degradation (0.1–0.3%). These results validate the method’s efficiency, robustness, and scalability.

Technology Category

Application Category

📝 Abstract

With the growing size of deep neural networks and datasets, the computational costs of training have significantly increased. The layer-freezing technique has recently attracted great attention as a promising method to effectively reduce the cost of network training. However, in traditional layer-freezing methods, frozen layers are still required for forward propagation to generate feature maps for unfrozen layers, limiting the reduction of computation costs. To overcome this, prior works proposed a hypothetical solution, which caches feature maps from frozen layers as a new dataset, allowing later layers to train directly on stored feature maps. While this approach appears to be straightforward, it presents several major challenges that are severely overlooked by prior literature, such as how to effectively apply augmentations to feature maps and the substantial storage overhead introduced. If these overlooked challenges are not addressed, the performance of the caching method will be severely impacted and even make it infeasible. This paper is the first to comprehensively explore these challenges and provides a systematic solution. To improve training accuracy, we propose extit{similarity-aware channel augmentation}, which caches channels with high augmentation sensitivity with a minimum additional storage cost. To mitigate storage overhead, we incorporate lossy data compression into layer freezing and design a extit{progressive compression} strategy, which increases compression rates as more layers are frozen, effectively reducing storage costs. Finally, our solution achieves significant reductions in training cost while maintaining model accuracy, with a minor time overhead. Additionally, we conduct a comprehensive evaluation of freezing and compression strategies, providing insights into optimizing their application for efficient DNN training.

Problem

Research questions and friction points this paper is trying to address.

Overcoming computational cost limits in layer-freezing DNN training

Addressing feature map augmentation challenges in cached training

Reducing storage overhead in progressive compression strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Similarity-aware channel augmentation for feature caching

Progressive compression strategy for storage reduction

Lossy compression integration with layer freezing technique

🔎 Similar Papers

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers