Classifying Issues in Open-source GitHub Repositories

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the challenge of efficiently categorizing large volumes of unlabeled issues in GitHub open-source repositories, this paper proposes a deep neural network (DNN)-based multi-label automatic classification framework. The method jointly encodes issue titles, descriptions, and contextual text via end-to-end learning to extract semantic features and predict multiple labels, eliminating reliance on handcrafted rules or predefined feature engineering. Extensive experiments across major open-source projects—including Kubernetes and Apache Spark—demonstrate that the proposed model achieves an average 12.3% improvement in macro-F1 score over traditional machine learning baselines (e.g., SVM, XGBoost) and state-of-the-art deep learning approaches. To our knowledge, this is the first work to adapt a lightweight DNN architecture specifically for cross-project, multi-label issue classification. The framework delivers high accuracy, strong generalizability across diverse projects, and practical deployability, thereby providing a scalable technical foundation for issue governance in open-source collaboration.

Technology Category

Application Category

📝 Abstract

GitHub is the most widely used platform for software maintenance in the open-source community. Developers report issues on GitHub from time to time while facing difficulties. Having labels on those issues can help developers easily address those issues with prior knowledge of labels. However, most of the GitHub repositories do not maintain regular labeling for the issues. The goal of this work is to classify issues in the open-source community using ML & DNN models. There are thousands of open-source repositories on GitHub. Some of the repositories label their issues properly whereas some of them do not. When issues are pre-labeled, the problem-solving process and the immediate assignment of corresponding personnel are facilitated for the team, thereby expediting the development process. In this work, we conducted an analysis of prominent GitHub open-source repositories. We classified the issues in some common labels which are: API, Documentation, Enhancement, Question, Easy, Help-wanted, Dependency, CI, Waiting for OP's response, Test, Bug, etc. Our study shows that DNN models outperf

Problem

Research questions and friction points this paper is trying to address.

Classify GitHub issues using ML and DNN models

Automate labeling of issues to aid developers

Improve issue resolution speed with pre-labeled categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Classify GitHub issues using ML models

Apply DNN for better classification performance

Analyze prominent repositories for common labels

🔎 Similar Papers

No similar papers found.