PRE-Share Data: Assistance Tool for Resource-aware Designing of Data-sharing Pipelines

๐Ÿ“… 2025-03-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In cross-organizational data sharing, existing multi-pipeline transformation design suffers from low efficiency and severe resource waste under dual constraints of governance compliance and recipient-side adaptability. Method: This paper proposes a reuse-aware pipeline design assistance paradigm that integrates flowchart-based modeling, semantic matching of transformation operations, fine-grained resource consumption modeling, and heuristic configuration optimization. It enables automatic identification of reusable transformation components across pipelines, recommends optimal pipeline structures, and quantifies potential resource savings. Contribution/Results: As the first design assistance framework supporting predictive reporting generation, it achieves, on real-world use cases, an average 37% reduction in computational resource consumption and a 52% reduction in design cycle time, while remaining compatible with self-service data platform deployments.

Technology Category

Application Category

๐Ÿ“ Abstract
Data is a valuable asset, and sharing it as a product across organizations is key to building comprehensive and useful insights in fields such as science and industry. Before sharing, data often requires transformation to comply with governance policies and meet the requirements of recipient organizations. By leveraging pipelines, these transformations can be modeled as chains of processes; however, designing such pipelines while ensuring their efficiency is complex. In this paper, we present a tool that supports the design of pipelines by identifying opportunities for reusing transformation processes across different pipelines and suggesting designs and configurations based on these opportunities. This tool also generates reports on the resource consumption of pipeline processes, enabling the estimation of potential resource savings achievable through reuse-based designs. It could serve as a foundation for more efficient and resource-conscious data transformation pipeline design and be used as a component in self-service data platforms.
Problem

Research questions and friction points this paper is trying to address.

Designing efficient data-sharing pipelines across organizations
Reusing transformation processes to optimize resource consumption
Ensuring compliance with governance policies and recipient requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tool for reusing transformation processes across pipelines
Generates reports on pipeline resource consumption
Supports efficient, resource-conscious pipeline design
๐Ÿ”Ž Similar Papers
2024-07-23IEEE International Conference on Cluster ComputingCitations: 1