๐ค AI Summary
Enterprises widely lack trust in AI agentsโ access to production data, primarily because conventional lakehouse architectures fail to support high-concurrency access and strong consistency guarantees in multilingual, decoupled execution environments.
Method: This paper proposes Bauplanโa novel lakehouse architecture that natively embeds transactional control and governance mechanisms. It introduces the first MVCC-inspired runtime mechanism tailored for AI agents and redefines the data-compute isolation model. Key innovations include transactional lakehouse semantics, fine-grained data versioning, runtime isolation, and self-healing pipelines.
Contribution/Results: Bauplan enables governance-driven, trustworthy AI agent workflows. Prototype evaluation demonstrates that it ensures data correctness, system stability, and full auditability under high concurrency, thereby significantly enhancing deep synergy between AI agents and data infrastructure.
๐ Abstract
Even as AI capabilities improve, most enterprises do not consider agents trustworthy enough to work on production data. In this paper, we argue that the path to trustworthy agentic workflows begins with solving the infrastructure problem first: traditional lakehouses are not suited for agent access patterns, but if we design one around transactions, governance follows. In particular, we draw an operational analogy to MVCC in databases and show why a direct transplant fails in a decoupled, multi-language setting. We then propose an agent-first design, Bauplan, that reimplements data and compute isolation in the lakehouse. We conclude by sharing a reference implementation of a self-healing pipeline in Bauplan, which seamlessly couples agent reasoning with all the desired guarantees for correctness and trust.