Crossover Designs in Software Engineering Experiments: Review of the State of Analysis

📅 2024-08-14
🏛️ International Symposium on Empirical Software Engineering and Measurement
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies the long-standing neglect of validity threats—particularly carryover effects—in crossover-design experiments within software engineering (SE). Method: Following Vegas et al.’s guidelines, we conducted the first quantitative assessment of 67 crossover-design experiments reported in 136 SE papers (2015–2024), employing forward snowball sampling and systematic content coding. Contribution/Results: Only 29.5% of validity threats were adequately addressed, and carryover effects were explicitly modeled in a mere 3% of studies. Although overall analytical validity has improved compared to earlier periods, practical adherence remains critically insufficient. Crucially, this work provides the first empirical evidence that low guideline adoption is a primary bottleneck. We propose a novel, taxonomy-based framework for classifying and assessing validity threats, offering actionable pathways and empirical grounding to enhance methodological rigor in SE experimentation.

Technology Category

Application Category

📝 Abstract
Experimentation is an essential method for causal inference in any empirical discipline. Crossover-design experiments are common in Software Engineering (SE) research. In these, subjects apply more than one treatment in different orders. This design increases the amount of obtained data and deals with subject variability but introduces threats to internal validity like the learning and carryover effect. Vegas et al. reviewed the state of practice for crossover designs in SE research and provided guidelines on how to address its threats during data analysis while still harnessing its benefits. In this paper, we reflect on the impact of these guidelines and review the state of analysis of crossover design experiments in SE publications between 2015 and March 2024. To this end, by conducting a forward snowballing of the guidelines, we survey 136 publications reporting 67 crossover-design experiments and evaluate their data analysis against the provided guidelines. The results show that the validity of data analyses has improved compared to the original state of analysis. Still, despite the explicit guidelines, only 29.5% of all threats to validity were addressed properly. While the maturation and the optimal sequence threats are properly addressed in 35.8% and 38.8% of all studies in our sample respectively, the carryover threat is only modeled in about 3% of the observed cases. The lack of adherence to the analysis guidelines threatens the validity of the conclusions drawn from crossover design experiments
Problem

Research questions and friction points this paper is trying to address.

Cross-Design Experiment
Data Analysis Quality
Accuracy of Experimental Results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Design Experiments
Software Engineering Research
Methodological Compliance
🔎 Similar Papers
No similar papers found.