🤖 AI Summary
The environmental impact of AI image generation remains poorly quantified, particularly regarding per-image energy consumption across diverse models. Method: This study systematically measures and analyzes the energy consumption of 17 state-of-the-art text-to-image models via controlled hardware power profiling, varying architectural choices (U-Net vs. Transformer), weight quantization, output resolution, and prompt length, while evaluating image quality using FID and CLIP Score. Contribution/Results: (1) Per-image energy varies by up to 46× across models; (2) U-Net-based models are significantly more energy-efficient than Transformer-based ones; (3) weight quantization degrades energy efficiency; (4) resolution exhibits non-monotonic effects on energy (1.3–4.7× variation), whereas prompt length shows no statistically significant impact; (5) high fidelity (FID < 10) and low energy consumption are jointly achievable, enabling identification of Pareto-optimal models. These findings establish an empirical benchmark and actionable optimization guidelines for green AI image generation.
📝 Abstract
With the growing adoption of AI image generation, in conjunction with the ever-increasing environmental resources demanded by AI, we are urged to answer a fundamental question: What is the environmental impact hidden behind each image we generate? In this research, we present a comprehensive empirical experiment designed to assess the energy consumption of AI image generation. Our experiment compares 17 state-of-the-art image generation models by considering multiple factors that could affect their energy consumption, such as model quantization, image resolution, and prompt length. Additionally, we consider established image quality metrics to study potential trade-offs between energy consumption and generated image quality. Results show that image generation models vary drastically in terms of the energy they consume, with up to a 46x difference. Image resolution affects energy consumption inconsistently, ranging from a 1.3x to 4.7x increase when doubling resolution. U-Net-based models tend to consume less than Transformer-based one. Model quantization instead results to deteriorate the energy efficiency of most models, while prompt length and content have no statistically significant impact. Improving image quality does not always come at the cost of a higher energy consumption, with some of the models producing the highest quality images also being among the most energy efficient ones.