🤖 AI Summary
Stable Diffusion models face dual challenges in societal fairness (e.g., gender/racial bias) and environmental sustainability (high energy consumption). To address these without model fine-tuning or architectural modifications, this paper proposes the first search-based co-optimization framework that jointly optimizes hyperparameters and prompt structures. It treats bias metrics—quantified via standardized fairness evaluations—and empirically measured CPU+GPU energy consumption as competing objectives within a Pareto optimization setting, while constraining image quality to remain perceptually indistinguishable from baseline outputs. Experiments demonstrate significant improvements over six state-of-the-art baselines: 68% reduction in gender bias, 59% reduction in racial bias, and 48% decrease in total computational energy consumption. Crucially, the optimized configurations exhibit strong generalizability across diverse prompts and robust stability across multiple inference rounds—validating the method’s practicality for real-world deployment under fairness and sustainability constraints.
📝 Abstract
Background: Text-to-image generation models are widely used across numerous domains. Among these models, Stable Diffusion (SD) - an open-source text-to-image generation model - has become the most popular, producing over 12 billion images annually. However, the widespread use of these models raises concerns regarding their social and environmental sustainability.
Aims: To reduce the harm that SD models may have on society and the environment, we introduce SustainDiffusion, a search-based approach designed to enhance the social and environmental sustainability of SD models.
Method: SustainDiffusion searches the optimal combination of hyperparameters and prompt structures that can reduce gender and ethnic bias in generated images while also lowering the energy consumption required for image generation. Importantly, SustainDiffusion maintains image quality comparable to that of the original SD model.
Results: We conduct a comprehensive empirical evaluation of SustainDiffusion, testing it against six different baselines using 56 different prompts. Our results demonstrate that SustainDiffusion can reduce gender bias in SD3 by 68%, ethnic bias by 59%, and energy consumption (calculated as the sum of CPU and GPU energy) by 48%. Additionally, the outcomes produced by SustainDiffusion are consistent across multiple runs and can be generalised to various prompts.
Conclusions: With SustainDiffusion, we demonstrate how enhancing the social and environmental sustainability of text-to-image generation models is possible without fine-tuning or changing the model's architecture.