Synergizing LLMs and Knowledge Graphs: A Novel Approach to Software Repository-Related Question Answering

📅 2024-12-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

163K/year

🤖 AI Summary

To address inaccurate intent understanding by large language models (LLMs) and imprecise retrieval of relevant data in software repository question answering, this paper proposes an LLM–Knowledge Graph (KG) collaborative framework. First, a domain-specific KG is constructed from multi-source GitHub data (Issues, Pull Requests, and Code) using Neo4j. Second, graph-enhanced prompting enables precise, context-aware question answering. The framework introduces a novel two-stage collaboration paradigm integrating dynamic graph-based retrieval with few-shot Chain-of-Thought (CoT) prompting, significantly improving reasoning accuracy. Evaluated on 20 complex questions across five prominent open-source projects, the baseline achieves 65% accuracy; incorporating CoT raises it to 84%. Designed for both technical depth and practical usability, the approach ensures broad accessibility—supporting both developers and non-technical users alike.

Technology Category

Application Category

📝 Abstract

Software repositories contain valuable information for gaining insights into their development process. However, extracting insights from these repository data is time-consuming and requires technical expertise. While software engineering chatbots have been developed to facilitate natural language interactions with repositories, they struggle with understanding natural language and accurately retrieving relevant data. This study aims to improve the accuracy of LLM-based chatbots in answering repository-related questions by augmenting them with knowledge graphs. We achieve this in a two-step approach; (1) constructing a knowledge graph from the repository data and (2) synergizing the knowledge graph with LLM to allow for the natural language questions and answers. We curated a set of 20 questions with different complexities and evaluated our approach on five popular open-source projects. Our approach achieved an accuracy of 65%. We further investigated the limitations and identified six key issues, with the majority relating to the reasoning capability of the LLM. We experimented with a few-shot chain-of-thought prompting to determine if it could enhance our approach. This technique improved the overall accuracy to 84%. Our findings demonstrate the synergy between LLMs and knowledge graphs as a viable solution for making repository data accessible to both technical and non-technical stakeholders.

Problem

Research questions and friction points this paper is trying to address.

Improving chatbot accuracy for repository questions

Overcoming limitations of intent-based chatbot understanding

Enhancing data retrieval from software repositories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting LLMs with knowledge graphs for QA

Constructing knowledge graphs from repository data

Applying few-shot chain-of-thought prompting

🔎 Similar Papers

Dual Reasoning: A GNN-LLM Collaborative Framework for Knowledge Graph Question Answering