Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost

📅 2024-07-29
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) employing chain-of-thought (CoT) reasoning often generate verbose outputs, leading to low inference efficiency, high computational energy consumption, and insufficient answer conciseness. Method: This paper proposes a “correct conciseness” evaluation framework that jointly quantifies accuracy and conciseness—first of its kind—and introduces constrained chain-of-thought (CCoT), a prompting strategy leveraging dual-dimensional optimization: redundancy detection and information-flow modeling to refine reasoning paths. A multi-dataset benchmark is constructed to evaluate CCoT across mainstream LLMs. Contribution/Results: Experiments demonstrate that CCoT reduces average output length by 32%, significantly improving both inference efficiency and answer accuracy while preserving logical rigor. This work establishes a novel paradigm for efficient, controllable LLM reasoning and provides a reproducible technical pathway toward concise, reliable reasoning.

Technology Category

Application Category

📝 Abstract
Today's large language models (LLMs) can solve challenging question-answering tasks, and prompt engineering techniques, such as chain-of-thought (CoT), have gained attention for enhancing the explanation and correctness of outputs. However, many models and techniques tend to produce excessively verbose and lengthy answers, leading to issues with both conciseness and generation time. To address this, this paper analyzes the impact of output lengths on LLM inference pipelines by introducing and proposing novel metrics to evaluate the extit{correct conciseness} of a model and related prompting techniques. Then, we examine the impact of controlling output length through a refined prompt engineering strategy, Constrained-CoT (CCoT), which encourages the model to produce more concise outputs. To better understand the effects of such a prompt, we also introduce two additional scores for analyzing the conciseness, measured in terms of redundancy and information flow in generated answers. Experiments on pretrained LLMs and multiple datasets demonstrate the benefits of the proposed metrics and the effectiveness of CCoT across different models.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Chaining of Thoughts Method
Energy Consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained Chain of Thought
Optimization of Large Language Models
Enhanced Conciseness and Speed
🔎 Similar Papers
No similar papers found.
S
Sania Nayab
Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
Giulio Rossolini
Giulio Rossolini
Scuola Superiore Sant'Anna
Trustworthy AISafe and Secure AIComputer VisionLLMs
Giorgio Buttazzo
Giorgio Buttazzo
Professor of Computer Science, Scuola Superiore Sant'Anna
Real-Time Systems
N
Nicolamaria Manes
Mediavoice Srl - Roma e Napoli, Italy
F
F. Giacomelli
Mediavoice Srl - Roma e Napoli, Italy