🤖 AI Summary
This work addresses the issue of redundant output—commonly referred to as “babbling”—in large language model (LLM)-generated code, which imposes unnecessary cognitive load and increases computational overhead and energy consumption. The authors propose a model-agnostic Babbling Suppression method that integrates a test-driven early-stopping mechanism into the LLM code generation pipeline: during generation, program tests are dynamically executed, and decoding terminates immediately once the output passes all tests, without requiring any model modification or fine-tuning. Experimental results demonstrate that this approach reduces energy consumption by up to 65% on Python and 62% on Java benchmarks. Across 40 experimental configurations, energy usage decreased on average in 29 cases and token generation was reduced in 35, all while preserving solution accuracy and significantly improving both generation efficiency and energy efficacy.
📝 Abstract
Context: Large Language Models (LLMs) are increasingly used in modern software development, aiding in code generation, code completion, and refactoring through AI-powered assistants. While they accelerate development workflows, they often produce extraneous output, referred to as "babbling", which incurs additional cognitive, economic, and energy costs. Objective: This work investigates the problem of babbling in LLM-based code generation and proposes a practical, model-agnostic approach to reduce unnecessary output without compromising solution accuracy. Method: We introduce Babbling Suppression (BS), a method that integrates test execution into the LLM generation process by evaluating intermediate outputs and terminating generation once a solution passes all tests. This prevents excessive token generation while having no impact on model accuracy. An empirical study was conducted across two Python and two Java benchmarks, targeting four 3-4B parameter models and six 6-7B parameter models. Results: Our findings show that babbling occurs across all tested models, with higher frequency in Java than in Python. Applying BS significantly reduces energy consumption by up to 65% for Python and 62% for Java in models prone to babbling. Across 40 model-benchmark pairs, 29 showed reduced mean energy consumption, with reductions exceeding 20% in 22 cases. Generated token count decreased in 35 pairs, while the GPU energy-per-token overhead of BS remained below 10% for 26 pairs, decreased for 2, and reached a maximum of 24%, yielding net energy savings in most cases. Implications: BS can make AI-assisted programming more efficient and sustainable by reducing energy consumption and minimizing cognitive effort by developers. Its model-agnostic design allows easy integration, suggesting broad applicability.