Towards Sustainable NLP: Insights from Benchmarking Inference Energy in Large Language Models

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Large language models (LLMs) incur substantial energy consumption during inference, yet their energy-efficiency mechanisms remain poorly understood—necessitating systematic investigation. This paper introduces the first comprehensive energy benchmarking framework for LLM inference, spanning diverse tasks, models, and system configurations: it evaluates 12 mainstream LLMs across 7 NLP tasks, empirically analyzing how model architecture, task type, prompt design, batch size, and quantization strategies (INT4/INT8) affect energy consumption. We find that energy consumption scales nearly linearly with output token count and is significantly influenced by response latency. Through INT4/INT8 quantization, optimal batch-size tuning, and energy-aware prompt engineering, inference energy is reduced by up to 67%. Our work establishes a reproducible energy-efficiency evaluation methodology and provides practical guidelines for green LLM deployment, laying foundational groundwork for low-carbon AI systems.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly recognized for their exceptional generative capabilities and versatility across various tasks. However, the high inference costs associated with these models have not received adequate attention, particularly when compared to the focus on training costs in existing research. In response to this gap, our study conducts a comprehensive benchmarking of LLM inference energy across a wide range of NLP tasks, where we analyze the impact of different models, tasks, prompts, and system-related factors on inference energy. Specifically, our experiments reveal several interesting insights, including strong correlation of inference energy with output token length and response time. Also, we find that quantization and optimal batch sizes, along with targeted prompt phrases, can significantly reduce energy usage. This study is the first to thoroughly benchmark LLM inference across such a diverse range of aspects, providing insights and offering several recommendations for improving energy efficiency in model deployment.

Problem

Research questions and friction points this paper is trying to address.

Benchmarks LLM inference energy across NLP tasks

Analyzes factors affecting inference energy efficiency

Proposes methods to reduce energy usage in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarks LLM inference energy

Quantization reduces energy usage

Optimal batch sizes improve efficiency

🔎 Similar Papers

No similar papers found.