🤖 AI Summary
This work addresses two critical challenges in long-input-to-long-output text generation: the absence of standardized benchmarks and the “lost-in-the-middle” phenomenon. To this end, we introduce LongInOutBench—the first comprehensive benchmark featuring synthetic datasets and a multidimensional evaluation framework integrating automated metrics and human annotation. We further propose RAL-Writer, a novel method that synergistically combines retrieval-augmented generation (RAG) with explicit prompt rewriting to dynamically identify and reinforce salient mid-sequence information overlooked by standard models. Crucially, RAL-Writer is the first approach to jointly leverage structured prompt rewriting and RAG to mitigate contextual information decay in long-context settings. Extensive experiments on LongInOutBench demonstrate that RAL-Writer significantly outperforms state-of-the-art long-context models, achieving a 27.4% improvement in key-information recall and substantial gains in coherence and factual consistency. Both the benchmark and implementation are publicly released.
📝 Abstract
Existing long-text generation methods primarily concentrate on producing lengthy texts from short inputs, neglecting the long-input and long-output tasks. Such tasks have numerous practical applications while lacking available benchmarks. Moreover, as the input grows in length, existing methods inevitably encounter the"lost-in-the-middle"phenomenon. In this paper, we first introduce a Long Input and Output Benchmark (LongInOutBench), including a synthetic dataset and a comprehensive evaluation framework, addressing the challenge of the missing benchmark. We then develop the Retrieval-Augmented Long-Text Writer (RAL-Writer), which retrieves and restates important yet overlooked content, mitigating the"lost-in-the-middle"issue by constructing explicit prompts. We finally employ the proposed LongInOutBench to evaluate our RAL-Writer against comparable baselines, and the results demonstrate the effectiveness of our approach. Our code has been released at https://github.com/OnlyAR/RAL-Writer.