🤖 AI Summary
To address challenges of insufficient scientific data provenance integrity and weak cross-organizational interoperability in multi-institutional collaborative research, this paper proposes a federated provenance architecture integrated with a permissioned blockchain. The architecture adopts a modular, domain-agnostic design, incorporating persistent identifiers (PIDs), versioned provenance graph modeling, and federated computation mechanisms—ensuring decentralized interaction while guaranteeing immutability, long-term auditability, and cross-platform verifiability of provenance data. Unlike existing approaches, our work is the first to deeply embed a permissioned blockchain into the federated provenance workflow, thereby overcoming provenance consistency bottlenecks imposed by organizational boundaries. Evaluation of a prototype system demonstrates significant improvements in transparency, accountability, and reproducibility of cross-institutional research data, establishing foundational infrastructure for trustworthy large-scale scientific data analysis.
📝 Abstract
Ensuring the trustworthiness and long-term verifiability of scientific data is a foundational challenge in the era of data-intensive, collaborative research. Provenance metadata plays a key role in this context, capturing the origin, transformation, and usage of research artifacts. However, existing solutions often fall short when applied to distributed, multi-institutional settings. This paper introduces a modular, domain-agnostic architecture for provenance tracking in federated environments, leveraging permissioned blockchain infrastructure to guarantee integrity, immutability, and auditability. The system supports decentralized interaction, persistent identifiers for artifact traceability, and a provenance versioning model that preserves the history of updates. Designed to interoperate with diverse scientific domains, the architecture promotes transparency, accountability, and reproducibility across organizational boundaries. Ongoing work focuses on validating the system through a distributed prototype and exploring its performance in collaborative settings.