SkillScope: A Tool to Predict Fine-Grained Skills Needed to Solve Issues on GitHub

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

New open-source contributors often struggle to identify suitable tasks due to insufficient skill requirements in issue descriptions and overly coarse-grained existing labeling schemes. To address this, we propose a fine-grained, multi-level programming skill prediction method that uniquely integrates large language models (LLMs) with random forests (RFs). Our approach leverages GitHub API data, issue mining from Java projects, and ontology-based skill modeling to generate semantically rich, interpretable, and structured skill predictions. Unlike conventional tag-based systems, our method transcends coarse granularity by supporting identification of 217 distinct fine-grained skills. In a case study, it achieves 91% precision, 88% recall, and 89% F1-score—substantially outperforming state-of-the-art skill annotation techniques. This work advances task–contributor matching through explainable, scalable, and empirically validated skill inference.

Technology Category

Application Category

📝 Abstract

New contributors often struggle to find tasks that they can tackle when onboarding onto a new Open Source Software (OSS) project. One reason for this difficulty is that issue trackers lack explanations about the knowledge or skills needed to complete a given task successfully. These explanations can be complex and time-consuming to produce. Past research has partially addressed this problem by labeling issues with issue types, issue difficulty level, and issue skills. However, current approaches are limited to a small set of labels and lack in-depth details about their semantics, which may not sufficiently help contributors identify suitable issues. To surmount this limitation, this paper explores large language models (LLMs) and Random Forest (RF) to predict the multilevel skills required to solve the open issues. We introduce a novel tool, SkillScope, which retrieves current issues from Java projects hosted on GitHub and predicts the multilevel programming skills required to resolve these issues. In a case study, we demonstrate that SkillScope could predict 217 multilevel skills for tasks with 91% precision, 88% recall, and 89% F-measure on average. Practitioners can use this tool to better delegate or choose tasks to solve in OSS projects.

Problem

Research questions and friction points this paper is trying to address.

Open Source Software

New Contributor Onboarding

Skill Matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

SkillScope

Multi-level Programming Skills Prediction

Large Language Model and Random Forest

🔎 Similar Papers

No similar papers found.