🤖 AI Summary
This study investigates dataset usage patterns in empirical mobile application requirements engineering (RE) research to identify data source bias and its implications for external validity. Following Kitchenham et al.’s systematic mapping methodology, we analyze 43 empirical studies published between 2012 and 2023. Results reveal that over 90% rely exclusively on Google Play and Apple App Store—neglecting critical RE activities such as requirement validation and evolution—while exhibiting pronounced dataset homogeneity despite growing adoption. This work provides the first quantitative evidence of data source narrowing in mobile RE research. To mitigate this risk, we propose a “multi-source data fusion” framework advocating integration across platforms (e.g., F-Droid, GitHub), modalities (e.g., user reviews, source code, changelogs), and RE activities. The framework advances methodological rigor and practical relevance, supporting more generalizable and empirically grounded mobile RE research.
📝 Abstract
[Background] Research on requirements engineering (RE) for mobile apps employs datasets formed by app users, developers or vendors. However, little is known about the sources of these datasets in terms of platforms and the RE activities that were researched with the help of the respective datasets. [Aims] The goal of this paper is to investigate the state-of-the-art of the datasets of mobile apps used in existing RE research. [Method] We carried out a systematic mapping study by following the guidelines of Kitchenham et al. [Results] Based on 43 selected papers, we found that Google Play and Apple App Store provide the datasets for more than 90% of published research in RE for mobile apps. We also found that the most investigated RE activities - based on datasets, are requirements elicitation and requirements analysis. [Conclusions] Our most important conclusions are: (1) there is a growth in the use of datasets for RE research of mobile apps since 2012, (2) the RE knowledge for mobile apps might be skewed due to the overuse of Google Play and Apple App Store, (3) there are attempts to supplement reviews of apps from repositories with other data sources, (4) there is a need to expand the alternative sources and experiments with complimentary use of multiple sources, if the community wants more generalizable results. Plus, it is expected to expand the research on other RE activities, beyond elicitation and analysis.