🤖 AI Summary
Large language models (LLMs) struggle with precise output length control in zero-shot summarization, exhibiting metric dependency and systematic length bias. This work is the first to systematically uncover the underlying mechanisms behind this failure. We propose four zero-shot, parameter-free, architecture-agnostic strategies—length approximation, target calibration, sample filtering, and automatic revision—that integrate prompt engineering, dynamic length estimation, heuristic sampling-based filtering, and iterative post-editing. Evaluated on LLaMA-3, our approach significantly improves length compliance (up to +42% absolute gain) without degrading ROUGE scores; human evaluations further confirm maintained or improved summary quality. The framework is scalable, plug-and-play, and enables high-fidelity, length-controllable zero-shot summarization without fine-tuning or model modification.
📝 Abstract
Large language models (LLMs) struggle with precise length control, particularly in zero-shot settings. We conduct a comprehensive study evaluating LLMs' length control capabilities across multiple measures and propose practical methods to improve controllability. Our experiments with LLaMA 3 reveal stark differences in length adherence across measures and highlight inherent biases of the model. To address these challenges, we introduce a set of methods: length approximation, target adjustment, sample filtering, and automated revisions. By combining these methods, we demonstrate substantial improvements in length compliance while maintaining or enhancing summary quality, providing highly effective zero-shot strategies for precise length control without the need for model fine-tuning or architectural changes. With our work, we not only advance our understanding of LLM behavior in controlled text generation but also pave the way for more reliable and adaptable summarization systems in real-world applications.