Energy Consumption in Parallel Neural Network Training

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study addresses energy consumption optimization in parallel neural network training. We systematically quantify the impact of GPU count, global and local batch sizes on energy consumption, training time, and model accuracy—based on data-parallel training of ResNet50 and FourCastNet across multi-GPU clusters. Results show total energy scales approximately linearly with GPU-hours; however, energy efficiency—measured in samples or gradient updates per GPU-hour—is strongly modulated by model architecture, hardware platform, and training dynamics. Crucially, we reveal that energy efficiency varies by up to an order of magnitude across distinct AI workloads—a finding previously unreported. To support reproducible evaluation, we establish a green-AI-oriented energy-efficiency benchmark. Our empirical findings provide quantitative, evidence-based guidance for sustainable AI system design and resource-aware scheduling in large-scale distributed training.

Technology Category

Application Category

📝 Abstract

The increasing demand for computational resources of training neural networks leads to a concerning growth in energy consumption. While parallelization has enabled upscaling model and dataset sizes and accelerated training, its impact on energy consumption is often overlooked. To close this research gap, we conducted scaling experiments for data-parallel training of two models, ResNet50 and FourCastNet, and evaluated the impact of parallelization parameters, i.e., GPU count, global batch size, and local batch size, on predictive performance, training time, and energy consumption. We show that energy consumption scales approximately linearly with the consumed resources, i.e., GPU hours; however, the respective scaling factor differs substantially between distinct model trainings and hardware, and is systematically influenced by the number of samples and gradient updates per GPU hour. Our results shed light on the complex interplay of scaling up neural network training and can inform future developments towards more sustainable AI research.

Problem

Research questions and friction points this paper is trying to address.

Impact of parallelization on neural network training energy consumption

Linear scaling of energy with GPU hours in parallel training

Influence of batch size and GPU count on training sustainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates parallelization impact on energy consumption

Linear energy scaling with GPU hours observed

Identifies factors influencing energy scaling differences

🔎 Similar Papers

No similar papers found.