π€ AI Summary
Large language models (LLMs) incur substantial energy consumption and carbon emissions, hindering sustainable software development. Method: This study investigates prompt engineering as a green optimization lever for small language models (SLMs) in code generation, evaluating four open-source SLMsβQwen2.5-Coder, StableCode-3B, CodeLlama-7B, and Phi-3-Mini-4Kβon LeetCode Python tasks under role-based, zero-shot, few-shot, and chain-of-thought (CoT) prompting. We measure runtime, memory footprint, and empirically measured energy consumption. Contribution/Results: CoT prompting significantly reduces energy consumption for certain SLMs, with strong model-specificity: Qwen2.5-Coder and StableCode-3B achieve consistent energy savings surpassing human baseline performance, whereas CodeLlama-7B and Phi-3-Mini-4K fail to meet it. This work is the first to empirically demonstrate CoT as an effective, energy-efficient prompting paradigm for SLMs in code generation, establishing a reproducible, low-power pathway for AI-assisted programming.
π Abstract
There is a growing concern about the environmental impact of large language models (LLMs) in software development, particularly due to their high energy use and carbon footprint. Small Language Models (SLMs) offer a more sustainable alternative, requiring fewer computational resources while remaining effective for fundamental programming tasks. In this study, we investigate whether prompt engineering can improve the energy efficiency of SLMs in code generation. We evaluate four open-source SLMs, StableCode-Instruct-3B, Qwen2.5-Coder-3B-Instruct, CodeLlama-7B-Instruct, and Phi-3-Mini-4K-Instruct, across 150 Python problems from LeetCode, evenly distributed into easy, medium, and hard categories. Each model is tested under four prompting strategies: role prompting, zero-shot, few-shot, and chain-of-thought (CoT). For every generated solution, we measure runtime, memory usage, and energy consumption, comparing the results with a human-written baseline. Our findings show that CoT prompting provides consistent energy savings for Qwen2.5-Coder and StableCode-3B, while CodeLlama-7B and Phi-3-Mini-4K fail to outperform the baseline under any prompting strategy. These results highlight that the benefits of prompting are model-dependent and that carefully designed prompts can guide SLMs toward greener software development.