Towards dimensions and granularity in a unified workflow and data provenance framework

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

In scientific provenance, the disconnection between workflow and data provenance—coupled with inconsistent dimensions and granularity—undermines trustworthiness and reproducibility. To address this, we propose the first unified framework that systematically integrates workflow and data provenance. Our approach introduces a dimension-granularity joint representation model, formally defines the W7+1 provenance problem, and enables domain-adaptable, end-to-end, fine-grained provenance modeling across the full research lifecycle. Evaluated on representative biomedical use cases, the framework achieves traceability from raw data and analytical steps to final results, significantly enhancing transparency, verifiability, and cross-study reproducibility. The core innovation lies in the first-ever orthogonal co-modeling and unified resolution of workflow and data provenance along both dimensional axes (e.g., who, what, when) and granularity levels (e.g., task-level, operation-level, byte-level).

Technology Category

Application Category

📝 Abstract

Provenance information are essential for the traceability of scientific studies or experiments and thus crucial for ensuring the credibility and reproducibility of research findings. This paper discusses a comprehensive provenance framework combining the two types 1. workflow provenance, and 2. data provenance as well as their dimensions and granularity, which enables the answering of W7+1 provenance questions. We demonstrate the applicability by employing a biomedical research use case, that can be easily transferred into other scientific fields. An integration of these concepts into a unified framework enables credibility and reproducibility of the research findings.

Problem

Research questions and friction points this paper is trying to address.

Develop unified framework for workflow and data provenance

Address dimensions and granularity in provenance tracking

Enhance research credibility and reproducibility via provenance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines workflow and data provenance framework

Answers W7+1 provenance questions effectively

Integrates dimensions and granularity for reproducibility

🔎 Similar Papers

Representing provenance and track changes of cultural heritage metadata in RDF: a survey of existing approaches