DeepQuali: Initial results of a study on the use of large language models for assessing the quality of user stories

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of effectively leveraging generative artificial intelligence for quality assessment and improvement of user stories in agile software development. The authors propose DeepQuali, a novel approach that explicitly integrates a large language model (GPT-4o) with established software requirements quality models to enable automated, interpretable evaluation of user stories. Expert evaluations demonstrate that DeepQuali achieves high accuracy in quality scoring and provides reasonable, actionable feedback, underscoring its potential for practical application in requirements engineering. However, reviewers also note the need for further refinement in integrating the method into existing development workflows. This work establishes a new paradigm for AI-driven requirements quality assurance, bridging the gap between generative AI capabilities and formal quality criteria in agile contexts.

Technology Category

Application Category

📝 Abstract

Generative artificial intelligence (GAI), specifically large language models (LLMs), are increasingly used in software engineering, mainly for coding tasks. However, requirements engineering - particularly requirements validation - has seen limited application of GAI. The current focus of using GAI for requirements is on eliciting, transforming, and classifying requirements, not on quality assessment. We propose and evaluate the LLM-based (GPT-4o) approach"DeepQuali", for assessing and improving requirements quality in agile software development. We applied it to projects in two small companies, where we compared LLM-based quality assessments with expert judgments. Experts also participated in walkthroughs of the solution, provided feedback, and rated their acceptance of the approach. Experts largely agreed with the LLM's quality assessments, especially regarding overall ratings and explanations. However, they did not always agree with the other experts on detailed ratings, suggesting that expertise and experience may influence judgments. Experts recognized the usefulness of the approach but criticized the lack of integration into their workflow. LLMs show potential in supporting software engineers with the quality assessment and improvement of requirements. The explicit use of quality models and explanatory feedback increases acceptance.

Problem

Research questions and friction points this paper is trying to address.

requirements quality

user stories

large language models

requirements validation

agile software development

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Requirements Quality Assessment

Agile Software Development

Explainable AI

Requirements Engineering

🔎 Similar Papers

No similar papers found.

Authors to Follow