Towards dimensions and granularity in a unified workflow and data provenance framework

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In scientific provenance, the disconnection between workflow and data provenance—coupled with inconsistent dimensions and granularity—undermines trustworthiness and reproducibility. To address this, we propose the first unified framework that systematically integrates workflow and data provenance. Our approach introduces a dimension-granularity joint representation model, formally defines the W7+1 provenance problem, and enables domain-adaptable, end-to-end, fine-grained provenance modeling across the full research lifecycle. Evaluated on representative biomedical use cases, the framework achieves traceability from raw data and analytical steps to final results, significantly enhancing transparency, verifiability, and cross-study reproducibility. The core innovation lies in the first-ever orthogonal co-modeling and unified resolution of workflow and data provenance along both dimensional axes (e.g., who, what, when) and granularity levels (e.g., task-level, operation-level, byte-level).

Technology Category

Application Category

📝 Abstract
Provenance information are essential for the traceability of scientific studies or experiments and thus crucial for ensuring the credibility and reproducibility of research findings. This paper discusses a comprehensive provenance framework combining the two types 1. workflow provenance, and 2. data provenance as well as their dimensions and granularity, which enables the answering of W7+1 provenance questions. We demonstrate the applicability by employing a biomedical research use case, that can be easily transferred into other scientific fields. An integration of these concepts into a unified framework enables credibility and reproducibility of the research findings.
Problem

Research questions and friction points this paper is trying to address.

Develop unified framework for workflow and data provenance
Address dimensions and granularity in provenance tracking
Enhance research credibility and reproducibility via provenance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines workflow and data provenance framework
Answers W7+1 provenance questions effectively
Integrates dimensions and granularity for reproducibility
🔎 Similar Papers
No similar papers found.
T
Tanja Auge
University of Regensburg, Germany
S
Sascha Genehr
University of Rostock, Germany, Rostock University Library, Germany
Meike Klettke
Meike Klettke
University of Regensburg
Data EngineeringSchema EvolutionDatabase TechnologyInformation IntegrationNoSQL databases
F
Frank Kruger
Wismar University of Applied Sciences, Germany
M
Max Schroder
Rostock University Library, Germany