π€ AI Summary
In requirements elicitation meetings, stakeholder dialogues contain rich requirement information, yet manual extraction is time-consuming, error-prone, and susceptible to subjective bias. Existing approaches primarily focus on dialogue summarization or requirement classification, lacking end-to-end capability to identify contextual cues and generate structured system requirements. This paper proposes a dual-granularity (utterance-level β meeting-level) requirement identification and generation framework that integrates natural language processing with large language models (LLMs), enabling fine-grained cue localization and executable requirement generation. Evaluated through combined performance analysis and engineer-led user studies, our method significantly outperforms baselines in correctness, completeness, and executability. It enhances elicitation efficiency while preserving human engineersβ authority in critical judgment tasks, thereby advancing the practical adoption of human-AI collaborative requirements engineering.
π Abstract
Stakeholders' conversations in requirements elicitation meetings hold valuable insights into system and client needs. However, manually extracting requirements is time-consuming, labor-intensive, and prone to errors and biases. While current state-of-the-art methods assist in summarizing stakeholder conversations and classifying requirements based on their nature, there is a noticeable lack of approaches capable of both identifying requirements within these conversations and generating corresponding system requirements. These approaches would assist requirement identification, reducing engineers' workload, time, and effort. To address this gap, this paper introduces RECOVER (Requirements EliCitation frOm conVERsations), a novel conversational requirements engineering approach that leverages natural language processing and large language models (LLMs) to support practitioners in automatically extracting system requirements from stakeholder interactions. The approach is evaluated using a mixed-method study that combines performance analysis with a user study involving requirements engineers, targeting two levels of granularity. First, at the conversation turn level, the evaluation measures RECOVER's accuracy in identifying requirements-relevant dialogue and the quality of generated requirements in terms of correctness, completeness, and actionability. Second, at the entire conversation level, the evaluation assesses the overall usefulness and effectiveness of RECOVER in synthesizing comprehensive system requirements from full stakeholder discussions. Empirical evaluation of RECOVER shows promising performance, with generated requirements demonstrating satisfactory correctness, completeness, and actionability. The results also highlight the potential of automating requirements elicitation from conversations as an aid that enhances efficiency while maintaining human oversight