🤖 AI Summary
Accurately evaluating the performance of analytical methods in simulation studies is hindered by underreporting and inconsistent handling of “missingness” issues—such as algorithm failure or non-convergence—that compromise validity and reproducibility.
Method: We conducted a large-scale empirical analysis of 482 methodological simulation studies, systematically extracting metadata, applying qualitative coding, and performing case studies—including publication bias correction—to quantify the prevalence and reporting practices of missingness.
Contribution/Results: We found that only 23% of studies mentioned missingness and merely 14% described mitigation strategies. Based on these findings, we developed a novel missingness taxonomy tailored to simulation research and proposed actionable principles—including mandatory missingness reporting—alongside a comprehensive, end-to-end practice guideline covering reporting, handling, and replication. Validation confirmed substantial improvements in transparency, comparability, and reproducibility of simulation studies.
📝 Abstract
Simulation studies are commonly used in methodological research for the empirical evaluation of data analysis methods. They generate artificial data sets under specified mechanisms and compare the performance of methods across conditions. However, simulation repetitions do not always produce valid outputs, e.g., due to non-convergence or other algorithmic failures. This phenomenon complicates the interpretation of results, especially when its occurrence differs between methods and conditions. Despite the potentially serious consequences of such"missingness", quantitative data on its prevalence and specific guidance on how to deal with it are currently limited. To this end, we reviewed 482 simulation studies published in various methodological journals and systematically assessed the prevalence and handling of missingness. We found that only 23% (111/482) of the reviewed simulation studies mention missingness, with even fewer reporting frequency (92/482 = 19%) or how it was handled (67/482 = 14%). We propose a classification of missingness and possible solutions. We give various recommendations, most notably to always quantify and report missingness, even if none was observed, to align missingness handling with study goals, and to share code and data for reproduction and reanalysis. Using a case study on publication bias adjustment methods, we illustrate common pitfalls and solutions.