Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Enterprises widely lack trust in AI agents’ access to production data, primarily because conventional lakehouse architectures fail to support high-concurrency access and strong consistency guarantees in multilingual, decoupled execution environments. Method: This paper proposes Bauplan—a novel lakehouse architecture that natively embeds transactional control and governance mechanisms. It introduces the first MVCC-inspired runtime mechanism tailored for AI agents and redefines the data-compute isolation model. Key innovations include transactional lakehouse semantics, fine-grained data versioning, runtime isolation, and self-healing pipelines. Contribution/Results: Bauplan enables governance-driven, trustworthy AI agent workflows. Prototype evaluation demonstrates that it ensures data correctness, system stability, and full auditability under high concurrency, thereby significantly enhancing deep synergy between AI agents and data infrastructure.

Technology Category

Application Category

📝 Abstract

Even as AI capabilities improve, most enterprises do not consider agents trustworthy enough to work on production data. In this paper, we argue that the path to trustworthy agentic workflows begins with solving the infrastructure problem first: traditional lakehouses are not suited for agent access patterns, but if we design one around transactions, governance follows. In particular, we draw an operational analogy to MVCC in databases and show why a direct transplant fails in a decoupled, multi-language setting. We then propose an agent-first design, Bauplan, that reimplements data and compute isolation in the lakehouse. We conclude by sharing a reference implementation of a self-healing pipeline in Bauplan, which seamlessly couples agent reasoning with all the desired guarantees for correctness and trust.

Problem

Research questions and friction points this paper is trying to address.

Traditional lakehouses fail to support agent data access patterns

Existing database concurrency models break in multi-language agent environments

Current infrastructure lacks governance for trustworthy agentic workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-first lakehouse design with data isolation

Reimplementation of compute isolation for agents

Self-healing pipeline with correctness guarantees

🔎 Similar Papers

Trustworthy, Responsible, and Safe AI: A Comprehensive Architectural Framework for AI Safety with Challenges and Mitigations