A Dataset For Computational Reproducibility

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Scientific computing artifacts—such as analysis scripts and software prototypes—frequently suffer from poor reproducibility due to environmental heterogeneity, dependency drift, and inadequate documentation, thereby undermining research credibility. To address this, we introduce the first cross-disciplinary, structured, and standardized benchmark dataset for computational experiments, encompassing workflows ranging from single-script executions to multi-language, complex pipelines. Our framework uniformly models metadata, standardizes dependency declarations (e.g., requirements.txt, Dockerfiles), encapsulates multi-language execution procedures, and prescribes a rigorous documentation protocol. The dataset comprises dozens of human-validated, fully reproducible experimental cases, enabling objective, comparable, and reproducible evaluation of reproducibility tools. This work fills a critical gap in the field by providing the first systematic, community-grounded benchmark for assessing computational reproducibility, thereby significantly enhancing the rigor, transparency, and comparability of reproducibility research.

Technology Category

Application Category

📝 Abstract

Ensuring the reproducibility of scientific work is crucial as it allows the consistent verification of scientific claims and facilitates the advancement of knowledge by providing a reliable foundation for future research. However, scientific work based on computational artifacts, such as scripts for statistical analysis or software prototypes, faces significant challenges in achieving reproducibility. These challenges are based on the variability of computational environments, rapid software evolution, and inadequate documentation of procedures. As a consequence, such artifacts often are not (easily) reproducible, undermining the credibility of scientific findings. The evaluation of reproducibility approaches, in particular of tools, is challenging in many aspects, one being the need to test them with the correct inputs, in this case computational experiments. Thus, this article introduces a curated dataset of computational experiments covering a broad spectrum of scientific fields, incorporating details about software dependencies, execution steps, and configurations necessary for accurate reproduction. The dataset is structured to reflect diverse computational requirements and methodologies, ranging from simple scripts to complex, multi-language workflows, ensuring it presents the wide range of challenges researchers face in reproducing computational studies. It provides a universal benchmark by establishing a standardized dataset for objectively evaluating and comparing the effectiveness of reproducibility tools. Each experiment included in the dataset is carefully documented to ensure ease of use. We added clear instructions following a standard, so each experiment has the same kind of instructions, making it easier for researchers to run each of them with their own reproducibility tool.

Problem

Research questions and friction points this paper is trying to address.

Ensuring reproducibility of computational scientific work

Addressing variability in computational environments and software

Providing standardized dataset for evaluating reproducibility tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated dataset for computational experiments reproducibility

Standardized instructions for diverse computational workflows

Universal benchmark for evaluating reproducibility tools

🔎 Similar Papers

No similar papers found.

Genentech

New York City, New York, United States of America / South San Francisco, California, United States of America

AI Research Engineer - Datadog AI Research (DAIR)

Datadog

$140,000—$400,000 USD

New York City / Paris

Research Scientist Intern, Multimodal AI (PhD)