DeepQuali: Initial results of a study on the use of large language models for assessing the quality of user stories

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of effectively leveraging generative artificial intelligence for quality assessment and improvement of user stories in agile software development. The authors propose DeepQuali, a novel approach that explicitly integrates a large language model (GPT-4o) with established software requirements quality models to enable automated, interpretable evaluation of user stories. Expert evaluations demonstrate that DeepQuali achieves high accuracy in quality scoring and provides reasonable, actionable feedback, underscoring its potential for practical application in requirements engineering. However, reviewers also note the need for further refinement in integrating the method into existing development workflows. This work establishes a new paradigm for AI-driven requirements quality assurance, bridging the gap between generative AI capabilities and formal quality criteria in agile contexts.

Technology Category

Application Category

📝 Abstract
Generative artificial intelligence (GAI), specifically large language models (LLMs), are increasingly used in software engineering, mainly for coding tasks. However, requirements engineering - particularly requirements validation - has seen limited application of GAI. The current focus of using GAI for requirements is on eliciting, transforming, and classifying requirements, not on quality assessment. We propose and evaluate the LLM-based (GPT-4o) approach"DeepQuali", for assessing and improving requirements quality in agile software development. We applied it to projects in two small companies, where we compared LLM-based quality assessments with expert judgments. Experts also participated in walkthroughs of the solution, provided feedback, and rated their acceptance of the approach. Experts largely agreed with the LLM's quality assessments, especially regarding overall ratings and explanations. However, they did not always agree with the other experts on detailed ratings, suggesting that expertise and experience may influence judgments. Experts recognized the usefulness of the approach but criticized the lack of integration into their workflow. LLMs show potential in supporting software engineers with the quality assessment and improvement of requirements. The explicit use of quality models and explanatory feedback increases acceptance.
Problem

Research questions and friction points this paper is trying to address.

requirements quality
user stories
large language models
requirements validation
agile software development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Requirements Quality Assessment
Agile Software Development
Explainable AI
Requirements Engineering
🔎 Similar Papers
No similar papers found.
A
Adam Trendowicz
Fraunhofer Institute for Experimental Software Engineering IESE, Fraunhofer Platz 1, 67663 Kaiserslautern, Germany
D
Daniel Seifert
Fraunhofer Institute for Experimental Software Engineering IESE, Fraunhofer Platz 1, 67663 Kaiserslautern, Germany
Andreas Jedlitschka
Andreas Jedlitschka
Fraunhofer Institute for Experimental Software Engineering
Data ScienceArtificial IntelligenceEmpirical Software Engineering
Marcus Ciolkowski
Marcus Ciolkowski
QAware GmbH
Software Engineering
A
Anton Strahilov
let's dev GmbH & Co. KG, Alter Schlachthof 33, 76131 Karlsruhe, Germany