Atlas: A Framework for ML Lifecycle Provenance&Transparency

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The widespread adoption of open-source machine learning (ML) datasets and models has intensified risks including data poisoning, supply-chain attacks, and regulatory non-compliance. Method: This paper proposes the first verifiable, end-to-end ML provenance framework integrating Trusted Execution Environments (TEEs) and transparent logging—built upon SPDX/SLSA standards and leveraging Intel SGX, hash-chain-based immutable logging, and zero-knowledge proofs. Contribution/Results: The framework enables provable artifact authenticity, auditable end-to-end lineage, and co-guaranteed confidentiality and integrity—without compromising intellectual property rights over data or models. Evaluated on two real-world ML pipelines, it achieves 100% metadata tampering detection, full verifiable traceability from training to deployment, and negligible runtime overhead—demonstrating practical viability for secure, compliant ML operations.

Technology Category

Application Category

📝 Abstract
The rapid adoption of open source machine learning (ML) datasets and models exposes today's AI applications to critical risks like data poisoning and supply chain attacks across the ML lifecycle. With growing regulatory pressure to address these issues through greater transparency, ML model vendors face challenges balancing these requirements against confidentiality for data and intellectual property needs. We propose Atlas, a framework that enables fully attestable ML pipelines. Atlas leverages open specifications for data and software supply chain provenance to collect verifiable records of model artifact authenticity and end-to-end lineage metadata. Atlas combines trusted hardware and transparency logs to enhance metadata integrity, preserve data confidentiality, and limit unauthorized access during ML pipeline operations, from training through deployment. Our prototype implementation of Atlas integrates several open-source tools to build an ML lifecycle transparency system, and assess the practicality of Atlas through two case study ML pipelines.
Problem

Research questions and friction points this paper is trying to address.

Addresses risks in ML lifecycle transparency
Balances regulatory needs with confidentiality
Enhances metadata integrity and data security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attestable ML pipelines framework
Uses trusted hardware and logs
Ensures data confidentiality and integrity
🔎 Similar Papers
No similar papers found.