🤖 AI Summary
This work addresses controllable abstractive summarization with predefined length constraints under the assumption of fixed pre-trained model architectures. We propose a lightweight, architecture-agnostic length control method that requires no modification to the model structure: it introduces a learnable weight on the EOS token prediction term within the standard cross-entropy loss—marking the first approach to directly embed length control into the training objective. The method is decoding-agnostic, orthogonal to inference-time strategies, and compatible with both encoder-decoder and GPT-style large language models. Extensive experiments across multiple models demonstrate stable, fine-grained length control: it consistently matches or improves ROUGE scores while significantly outperforming existing length-constrained baselines—without compromising summary quality.
📝 Abstract
Controlling the length of generated text can be crucial in various text-generation tasks, including summarization. Existing methods often require complex model alterations, limiting compatibility with pre-trained models. We address these limitations by developing a simple approach for controlling the length of automatic text summaries by increasing the importance of correctly predicting the EOS token in the cross-entropy loss computation. The proposed methodology is agnostic to architecture and decoding algorithms and orthogonal to other inference-time techniques to control the generation length. We tested it with encoder-decoder and modern GPT-style LLMs, and show that this method can control generation length, often without affecting the quality of the summary.