🤖 AI Summary
To address the severe scarcity of open-source user stories and technical requirements documentation in AI systems requirements engineering, this paper proposes a large language model (LLM)-based method for automated user story generation. We first construct UStAI, the first open-source, AI-systems-focused user story dataset—comprising 1,260 user stories derived from 42 papers across 26 domains. Second, we introduce a novel hybrid evaluation framework that synergistically integrates the Quality User Story (QUS) criteria with AI ethics principles and non-functional requirements mapping to rigorously assess LLM-generated outputs. Empirical evaluation demonstrates that models including GPT-4, Claude, and Llama effectively capture diverse stakeholder needs. UStAI is publicly released to support AI requirements elicitation, ethical alignment, and empirical research in requirements engineering.
📝 Abstract
AI systems are gaining widespread adoption across various sectors and domains. Creating high-quality AI system requirements is crucial for aligning the AI system with business goals and consumer values and for social responsibility. However, with the uncertain nature of AI systems and the heavy reliance on sensitive data, more research is needed to address the elicitation and analysis of AI systems requirements. With the proprietary nature of many AI systems, there is a lack of open-source requirements artifacts and technical requirements documents for AI systems, limiting broader research and investigation. With Large Language Models (LLMs) emerging as a promising alternative to human-generated text, this paper investigates the potential use of LLMs to generate user stories for AI systems based on abstracts from scholarly papers. We conducted an empirical evaluation using three LLMs and generated $1260$ user stories from $42$ abstracts from $26$ domains. We assess their quality using the Quality User Story (QUS) framework. Moreover, we identify relevant non-functional requirements (NFRs) and ethical principles. Our analysis demonstrates that the investigated LLMs can generate user stories inspired by the needs of various stakeholders, offering a promising approach for generating user stories for research purposes and for aiding in the early requirements elicitation phase of AI systems. We have compiled and curated a collection of stories generated by various LLMs into a dataset (UStAI), which is now publicly available for use.