Toward Green Code: Prompting Small Language Models for Energy-Efficient Code Generation

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Large language models (LLMs) incur substantial energy consumption and carbon emissions, hindering sustainable software development. Method: This study investigates prompt engineering as a green optimization lever for small language models (SLMs) in code generation, evaluating four open-source SLMs—Qwen2.5-Coder, StableCode-3B, CodeLlama-7B, and Phi-3-Mini-4K—on LeetCode Python tasks under role-based, zero-shot, few-shot, and chain-of-thought (CoT) prompting. We measure runtime, memory footprint, and empirically measured energy consumption. Contribution/Results: CoT prompting significantly reduces energy consumption for certain SLMs, with strong model-specificity: Qwen2.5-Coder and StableCode-3B achieve consistent energy savings surpassing human baseline performance, whereas CodeLlama-7B and Phi-3-Mini-4K fail to meet it. This work is the first to empirically demonstrate CoT as an effective, energy-efficient prompting paradigm for SLMs in code generation, establishing a reproducible, low-power pathway for AI-assisted programming.

Technology Category

Application Category

📝 Abstract

There is a growing concern about the environmental impact of large language models (LLMs) in software development, particularly due to their high energy use and carbon footprint. Small Language Models (SLMs) offer a more sustainable alternative, requiring fewer computational resources while remaining effective for fundamental programming tasks. In this study, we investigate whether prompt engineering can improve the energy efficiency of SLMs in code generation. We evaluate four open-source SLMs, StableCode-Instruct-3B, Qwen2.5-Coder-3B-Instruct, CodeLlama-7B-Instruct, and Phi-3-Mini-4K-Instruct, across 150 Python problems from LeetCode, evenly distributed into easy, medium, and hard categories. Each model is tested under four prompting strategies: role prompting, zero-shot, few-shot, and chain-of-thought (CoT). For every generated solution, we measure runtime, memory usage, and energy consumption, comparing the results with a human-written baseline. Our findings show that CoT prompting provides consistent energy savings for Qwen2.5-Coder and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to outperform the baseline under any prompting strategy. These results highlight that the benefits of prompting are model-dependent and that carefully designed prompts can guide SLMs toward greener software development.

Problem

Research questions and friction points this paper is trying to address.

Improving energy efficiency of small language models in code generation

Evaluating prompt engineering strategies for sustainable software development

Assessing model-specific energy savings with chain-of-thought prompting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using prompt engineering for energy efficiency

Evaluating four SLMs across multiple prompting strategies

Measuring runtime memory and energy consumption metrics

🔎 Similar Papers

EffiBench: Benchmarking the Efficiency of Automatically Generated Code