๐ค AI Summary
Bug reports are unstructured and verbose, and existing summarization methods predominantly rely on superficial textual cues while neglecting critical code snippetsโleading to redundant and incomplete summaries. To address this, we propose a generative summarization framework that progressively fuses natural language text with long code fragments. Our approach introduces a stepwise code integration mechanism that circumvents the context-length limitations of large language models (LLMs), enabling joint semantic modeling of textual and code modalities. Furthermore, we incorporate abstractive summarization techniques to enhance both accuracy and completeness in defect understanding. We evaluate our method across four benchmark datasets and eight LLMs; results show improvements of 7.5%โ58.2% over extractive baselines and performance competitive with state-of-the-art generative approaches.
๐ Abstract
Bug reports are often unstructured and verbose, making it challenging for developers to efficiently comprehend software issues. Existing summarization approaches typically rely on surface-level textual cues, resulting in incomplete or redundant summaries, and they frequently ignore associated code snippets, which are essential for accurate defect diagnosis. To address these limitations, we propose a progressive code-integration framework for LLM-based abstractive bug report summarization. Our approach incrementally incorporates long code snippets alongside textual content, overcoming standard LLM context window constraints and producing semantically rich summaries. Evaluated on four benchmark datasets using eight LLMs, our pipeline outperforms extractive baselines by 7.5%-58.2% and achieves performance comparable to state-of-the-art abstractive methods, highlighting the benefits of jointly leveraging textual and code information for enhanced bug comprehension.