An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots

📅 2024-10-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In software engineering (SE) chatbots, high-quality labeled data are scarce, and manually crafting labeling functions (LFs) is prohibitively expensive. Method: This paper proposes the first end-to-end fully automated LF generation framework for SE—requiring no manual rule authoring—by leveraging pattern mining and heuristic synthesis to automatically extract generalizable LFs from existing labeled queries. The method supports cross-domain SE datasets (AskGit, MSA, Ask Ubuntu, Stack Overflow) and quantitatively characterizes the relationship between LF count and labeling performance for the first time. Results: Experiments demonstrate that the generated LFs achieve an AUC of 85.3% and improve intent classification accuracy by up to 27.2%, substantially reducing annotation effort. This work delivers an efficient, scalable weak supervision solution for training SE-oriented natural language understanding (NLU) models.

Technology Category

Application Category

📝 Abstract
Software engineering (SE) chatbots are increasingly gaining attention for their role in enhancing development processes. At the core of chatbots are the Natural Language Understanding platforms (NLUs), which enable them to comprehend and respond to user queries. Before deploying NLUs, there is a need to train them with labeled data. However, acquiring such labeled data for SE chatbots is challenging due to the scarcity of high-quality datasets. This challenge arises because training SE chatbots requires specialized vocabulary and phrases not found in typical language datasets. Consequently, chatbot developers often resort to manually annotating user queries to gather the data necessary for training effective chatbots, a process that is both time-consuming and resource-intensive. Previous studies propose approaches to support chatbot practitioners in annotating users' posed queries. However, these approaches require human intervention to generate rules, called labeling functions (LFs), that identify and categorize user queries based on specific patterns in the data. To address this issue, we propose an approach to automatically generate LFs by extracting patterns from labeled user queries. We evaluate the effectiveness of our approach by applying it to the queries of four diverse SE datasets (namely AskGit, MSA, Ask Ubuntu, and Stack Overflow) and measure the performance improvement gained from training the NLU on the queries labeled by the generated LFs. We find that the generated LFs effectively label data with AUC scores of up to 85.3%, and NLU's performance improvement of up to 27.2% across the studied datasets. Furthermore, our results show that the number of LFs used to generate LFs affects the labeling performance. We believe that our approach can save time and resources in labeling users' queries, allowing practitioners to focus on core chatbot functionalities.
Problem

Research questions and friction points this paper is trying to address.

Automate labeling function generation for SE chatbots
Reduce manual annotation of specialized SE queries
Improve NLU training efficiency with auto-labeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically generate labeling functions from queries
Extract patterns to categorize user queries
Improve NLU performance with auto-labeled data
🔎 Similar Papers
No similar papers found.
E
Ebube Alor
Data-driven Analysis of Software (DAS) Lab at the Department of Computer Science & Software Engineering, Concordia University, Montreal, QC, Canada
Ahmad Abdellatif
Ahmad Abdellatif
Assistant professor, University of Calgary
ChatbotsSoftware EngineeringMining Software RepositoriesEmpirical Software Engineering
S
S. Khatoonabadi
Data-driven Analysis of Software (DAS) Lab at the Department of Computer Science & Software Engineering, Concordia University, Montreal, QC, Canada
Emad Shihab
Emad Shihab
Professor at Concordia University
Software EngineeringSE4AIMining Software RepositoriesSoftware AnalyticsSoftware Supply Chain