Automatically Identifying Solution-Related Content in Issue Report Discussions with Language Models

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Identifying solution-related content in software issue report discussions remains challenging due to linguistic ambiguity and contextual complexity. Method: This study proposes a multi-strategy classification framework leveraging large language models (LLMs), systematically comparing three LLM application paradigms—embedding-based retrieval, prompt engineering, and fine-tuning—on the task. A fine-tuned LLaMA model combined with ensemble learning achieves optimal performance. Contribution/Results: In ablation experiments across 68 configurations, the fine-tuned model attains an F1-score of 0.716, rising to 0.737 with ensemble integration. The approach demonstrates strong cross-project transferability, requiring only minimal target-project data for effective adaptation. This work presents the first systematic empirical evaluation of diverse LLM deployment strategies for solution-content identification in issue reports, establishing fine-tuning plus ensemble as the most effective trade-off between accuracy and generalizability. The method provides a practical, deployable foundation for automated defect resolution, regression analysis, and solution reuse in software maintenance.

Technology Category

Application Category

📝 Abstract
During issue resolution, software developers rely on issue reports to discuss solutions for defects, feature requests, and other changes. These discussions contain proposed solutions-from design changes to code implementations-as well as their evaluations. Locating solution-related content is essential for investigating reopened issues, addressing regressions, reusing solutions, and understanding code change rationale. Manually understanding long discussions to identify such content can be difficult and time-consuming. This paper automates solution identification using language models as supervised classifiers. We investigate three applications-embeddings, prompting, and fine-tuning-across three classifier types: traditional ML models (MLMs), pre-trained language models (PLMs), and large language models (LLMs). Using 356 Mozilla Firefox issues, we created a dataset to train and evaluate six MLMs, four PLMs, and two LLMs across 68 configurations. Results show that MLMs with LLM embeddings outperform TF-IDF features, prompting underperforms, and fine-tuned LLMs achieve the highest performance, with LLAMAft reaching 0.716 F1 score. Ensembles of the best models further improve results (0.737 F1). Misclassifications often arise from misleading clues or missing context, highlighting the need for context-aware classifiers. Models trained on Mozilla transfer to other projects, with a small amount of project-specific data, further enhancing results. This work supports software maintenance, issue understanding, and solution reuse.
Problem

Research questions and friction points this paper is trying to address.

Automates identification of solution-related content in issue reports
Compares language model approaches for classifying software discussions
Addresses manual analysis challenges in software maintenance and reuse
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using language models as supervised classifiers
Fine-tuned LLMs achieve highest performance
Ensemble models further improve classification results
🔎 Similar Papers
No similar papers found.