🤖 AI Summary
Natural language feature requests submitted by users in open-source software often suffer from ambiguity and incompleteness, hindering requirement understanding and development efficiency; in decentralized settings, traditional manual clarification methods (e.g., interviews) are not scalable. This paper introduces the first systematic application of large language models (LLMs) to automate feature request clarification in open-source contexts, proposing an end-to-end approach that automatically detects semantic defects and generates precise, actionable clarification questions. Through an empirical study on real-world open-source projects—including human annotation comparison and developer interviews—we demonstrate that the generated questions achieve high agreement with expert annotations (Cohen’s κ = 0.82) and are broadly endorsed by practitioners. Our work establishes a novel paradigm and practical pathway for LLM-enabled requirements engineering in open-source ecosystems.
📝 Abstract
The growing popularity and widespread use of software applications (apps) across various domains have driven rapid industry growth. Along with this growth, fast-paced market changes have led to constantly evolving software requirements. Such requirements are often grounded in feature requests and enhancement suggestions, typically provided by users in natural language (NL). However, these requests often suffer from defects such as ambiguity and incompleteness, making them challenging to interpret. Traditional validation methods (e.g., interviews and workshops) help clarify such defects but are impractical in decentralized environments like open-source software (OSS), where change requests originate from diverse users on platforms like GitHub. This paper proposes a novel approach leveraging Large Language Models (LLMs) to detect and refine NL defects in feature requests. Our approach automates the identification of ambiguous and incomplete requests and generates clarification questions (CQs) to enhance their usefulness for developers. To evaluate its effectiveness, we apply our method to real-world OSS feature requests and compare its performance against human annotations. In addition, we conduct interviews with GitHub developers to gain deeper insights into their perceptions of NL defects, the strategies they use to address these defects, and the impact of defects on downstream software engineering (SE) tasks.