Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses two critical challenges in long-input-to-long-output text generation: the absence of standardized benchmarks and the “lost-in-the-middle” phenomenon. To this end, we introduce LongInOutBench—the first comprehensive benchmark featuring synthetic datasets and a multidimensional evaluation framework integrating automated metrics and human annotation. We further propose RAL-Writer, a novel method that synergistically combines retrieval-augmented generation (RAG) with explicit prompt rewriting to dynamically identify and reinforce salient mid-sequence information overlooked by standard models. Crucially, RAL-Writer is the first approach to jointly leverage structured prompt rewriting and RAG to mitigate contextual information decay in long-context settings. Extensive experiments on LongInOutBench demonstrate that RAL-Writer significantly outperforms state-of-the-art long-context models, achieving a 27.4% improvement in key-information recall and substantial gains in coherence and factual consistency. Both the benchmark and implementation are publicly released.

Technology Category

Application Category

📝 Abstract

Existing long-text generation methods primarily concentrate on producing lengthy texts from short inputs, neglecting the long-input and long-output tasks. Such tasks have numerous practical applications while lacking available benchmarks. Moreover, as the input grows in length, existing methods inevitably encounter the"lost-in-the-middle"phenomenon. In this paper, we first introduce a Long Input and Output Benchmark (LongInOutBench), including a synthetic dataset and a comprehensive evaluation framework, addressing the challenge of the missing benchmark. We then develop the Retrieval-Augmented Long-Text Writer (RAL-Writer), which retrieves and restates important yet overlooked content, mitigating the"lost-in-the-middle"issue by constructing explicit prompts. We finally employ the proposed LongInOutBench to evaluate our RAL-Writer against comparable baselines, and the results demonstrate the effectiveness of our approach. Our code has been released at https://github.com/OnlyAR/RAL-Writer.

Problem

Research questions and friction points this paper is trying to address.

Addresses lack of benchmarks for long-input and long-output text generation tasks.

Mitigates 'lost-in-the-middle' issue in long-text generation using retrieval-augmented methods.

Introduces LongInOutBench for evaluating long-text generation models effectively.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduced LongInOutBench for long-text tasks

Developed RAL-Writer to mitigate lost-in-the-middle

Used retrieval-augmented prompts for content restatement

🔎 Similar Papers

No similar papers found.