Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) training incurs substantial energy consumption, raising environmental and economic sustainability concerns—yet existing research prioritizes algorithmic performance while neglecting systematic quantification of energy use, carbon emissions, and monetary cost. Method: This work presents the first empirical, multi-dimensional sustainability evaluation of seven mainstream DRL algorithms—including DQN, PPO, and A2C—across ten Atari games, grounded in real-time power measurements. Leveraging Stable Baselines implementations and U.S. national averages for electricity price and grid carbon intensity, we uniformly quantify energy consumption, carbon footprint, and training cost. Contribution/Results: We identify algorithmic configurations that maintain competitive learning performance while reducing energy use by 24%, carbon emissions by 68%, and training cost by 68% relative to baseline settings. This study establishes the first empirically grounded energy-efficiency benchmark and algorithm-selection guideline for green DRL.

Technology Category

Application Category

📝 Abstract
The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. While algorithmic efficiency in terms of learning performance has been extensively studied, the energy requirements, greenhouse gas emissions, and monetary costs of DRL algorithms remain largely unexplored. In this work, we present a systematic benchmarking study of the energy consumption of seven state-of-the-art DRL algorithms, namely DQN, TRPO, A2C, ARS, PPO, RecurrentPPO, and QR-DQN, implemented using Stable Baselines. Each algorithm was trained for one million steps each on ten Atari 2600 games, and power consumption was measured in real-time to estimate total energy usage, CO2-Equivalent emissions, and electricity cost based on the U.S. national average electricity price. Our results reveal substantial variation in energy efficiency and training cost across algorithms, with some achieving comparable performance while consuming up to 24% less energy (ARS vs. DQN), emitting nearly 68% less CO2, and incurring almost 68% lower monetary cost (QR-DQN vs. RecurrentPPO) than less efficient counterparts. We further analyze the trade-offs between learning performance, training time, energy use, and financial cost, highlighting cases where algorithmic choices can mitigate environmental and economic impact without sacrificing learning performance. This study provides actionable insights for developing energy-aware and cost-efficient DRL practices and establishes a foundation for incorporating sustainability considerations into future algorithmic design and evaluation.
Problem

Research questions and friction points this paper is trying to address.

Assessing energy consumption of DRL algorithms
Measuring CO2 emissions from training models
Evaluating monetary costs of reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarked energy consumption of seven DRL algorithms
Measured real-time power usage and CO2 emissions
Identified energy-efficient algorithms reducing costs by 68%
J
Jason Gardner
University of North Florida, Jacksonville, FL, USA
Ayan Dutta
Ayan Dutta
University of North Florida
RoboticsArtificial IntelligenceGraph TheoryGame Theory
Swapnoneel Roy
Swapnoneel Roy
Full Professor, University of North Florida
Computer SecurityEnergy-Aware ComputingAlgorithms
O
O. Patrick Kreidl
University of North Florida, Jacksonville, FL, USA
L
Ladislau Boloni
University of Central Florida, Orlando, FL, USA