🤖 AI Summary
In open-source software, novice contributors struggle to interpret static-analysis-derived defect-prediction metrics (e.g., cyclomatic complexity, coupling), hindering risk-aware code modification and decision-making. To address this, we propose an LLM-driven risk explanation generation framework that transforms abstract metrics into three types of natural-language outputs: descriptive risk summaries, context-sensitive root-cause analyses, and actionable refactoring recommendations. Our approach bridges the semantic gap between static-analysis signals and developer practice, substantially enhancing metric interpretability and utility. We implement a prototype on real-world OSS projects and conduct a task-oriented user study, demonstrating statistically significant improvements in review decision quality, reduced code-review time, and lower error rates. This work is the first systematic investigation of how large language models can collaboratively enhance the explainability of defect-prediction results and support developer decision-making in practical software engineering contexts.
📝 Abstract
Open Source Software (OSS) has become a very important and crucial infrastructure worldwide because of the value it provides. OSS typically depends on contributions from developers across diverse backgrounds and levels of experience. Making safe changes, such as fixing a bug or implementing a new feature, can be challenging, especially in object-oriented systems where components are interdependent. Static analysis and defect-prediction tools produce metrics (e.g., complexity,coupling) that flag potentially fault-prone components, but these signals are often hard for contributors new or unfamiliar with the codebase to interpret. Large Language Models (LLMs) have shown strong performance on software engineering tasks such as code summarization and documentation generation. Building on this progress, we investigate whether LLMs can translate fault-prediction metrics into clear, human-readable risk explanations and actionable guidance to help OSS contributors plan and review code modifications. We outline explanation types that an LLM-generated assistant could provide (descriptive, contextual, and actionable explanations). We also outline our next steps to assess usefulness through a task-based study with OSS contributors, comparing metric-only baselines to LLM-generated explanations on decision quality, time-to-completion, and error rates