🤖 AI Summary
Deep reinforcement learning (DRL) training incurs substantial energy consumption, raising environmental and economic sustainability concerns—yet existing research prioritizes algorithmic performance while neglecting systematic quantification of energy use, carbon emissions, and monetary cost. Method: This work presents the first empirical, multi-dimensional sustainability evaluation of seven mainstream DRL algorithms—including DQN, PPO, and A2C—across ten Atari games, grounded in real-time power measurements. Leveraging Stable Baselines implementations and U.S. national averages for electricity price and grid carbon intensity, we uniformly quantify energy consumption, carbon footprint, and training cost. Contribution/Results: We identify algorithmic configurations that maintain competitive learning performance while reducing energy use by 24%, carbon emissions by 68%, and training cost by 68% relative to baseline settings. This study establishes the first empirically grounded energy-efficiency benchmark and algorithm-selection guideline for green DRL.
📝 Abstract
The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. While algorithmic efficiency in terms of learning performance has been extensively studied, the energy requirements, greenhouse gas emissions, and monetary costs of DRL algorithms remain largely unexplored. In this work, we present a systematic benchmarking study of the energy consumption of seven state-of-the-art DRL algorithms, namely DQN, TRPO, A2C, ARS, PPO, RecurrentPPO, and QR-DQN, implemented using Stable Baselines. Each algorithm was trained for one million steps each on ten Atari 2600 games, and power consumption was measured in real-time to estimate total energy usage, CO2-Equivalent emissions, and electricity cost based on the U.S. national average electricity price. Our results reveal substantial variation in energy efficiency and training cost across algorithms, with some achieving comparable performance while consuming up to 24% less energy (ARS vs. DQN), emitting nearly 68% less CO2, and incurring almost 68% lower monetary cost (QR-DQN vs. RecurrentPPO) than less efficient counterparts. We further analyze the trade-offs between learning performance, training time, energy use, and financial cost, highlighting cases where algorithmic choices can mitigate environmental and economic impact without sacrificing learning performance. This study provides actionable insights for developing energy-aware and cost-efficient DRL practices and establishes a foundation for incorporating sustainability considerations into future algorithmic design and evaluation.