SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
New open-source contributors often struggle to identify suitable tasks due to insufficient skill requirements in issue descriptions and overly coarse-grained existing labeling schemes. To address this, we propose a fine-grained, multi-level programming skill prediction method that uniquely integrates large language models (LLMs) with random forests (RFs). Our approach leverages GitHub API data, issue mining from Java projects, and ontology-based skill modeling to generate semantically rich, interpretable, and structured skill predictions. Unlike conventional tag-based systems, our method transcends coarse granularity by supporting identification of 217 distinct fine-grained skills. In a case study, it achieves 91% precision, 88% recall, and 89% F1-score—substantially outperforming state-of-the-art skill annotation techniques. This work advances task–contributor matching through explainable, scalable, and empirically validated skill inference.

Technology Category

Application Category

📝 Abstract
New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty level, and issue skills. However, current approaches are limited to a small set of labels and lack in-depth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects.
Problem

Research questions and friction points this paper is trying to address.

Open Source Software
New Contributor Onboarding
Skill Matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

SkillScope
Multi-level Programming Skills Prediction
Large Language Model and Random Forest
🔎 Similar Papers
No similar papers found.
B
Benjamin C. Carter
Grand Canyon University, USA
J
Jonathan Rivas Contreras
Grand Canyon University, USA
C
Carlos A. Llanes Villegas
Grand Canyon University, USA
P
Pawan Acharya
Northern Arizona University, USA
J
Jack Utzerath
Grand Canyon University, USA
A
Adonijah O. Farner
Grand Canyon University, USA
H
Hunter Jenkins
Grand Canyon University, USA
D
Dylan Johnson
Grand Canyon University, USA
J
Jacob Penney
Northern Arizona University, USA
Igor Steinmacher
Igor Steinmacher
Northern Arizona University
Software EngineeringCSCWMining Software RepositoriesOpen Source Software
M
Marco A. Gerosa
Northern Arizona University, USA
Fabio Santos
Fabio Santos
Colorado State University
Software engineeringArtificial IntelligenceKnowledge ModelingOpen Source SoftwareSocial