Data Flow Control: Data Safety Policies for AI Agents

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

145K/year
🤖 AI Summary
This work addresses the critical gap that while AI-generated SQL queries may be semantically correct, they often violate data privacy and security policies, and current database systems lack fine-grained data flow control. The paper proposes Data Flow Control (DFC), a framework that embeds tuple-level security policies directly into the database infrastructure for the first time. DFC employs a declarative policy language to specify constraints and formalizes the security of aggregate predicates using provenance monomials. Through the Passant query rewriting layer, DFC enables optimizer-agnostic, cross-DBMS-compatible policy enforcement with zero runtime overhead, without materializing provenance data. Experiments demonstrate that Passant incurs near 0% performance overhead across DuckDB, Umbra, PostgreSQL, DataFusion, and SQL Server, outperforming existing approaches by several orders of magnitude, thereby shifting data security from prompt engineering to native infrastructure guarantees.
📝 Abstract
Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released. We argue that enforcing such constraints is fundamentally a data infrastructure problem. This paper introduces Data Flow Control (DFC), a framework to declaratively specify and guarantee policy enforcement over tuple-level data flows within a DBMS query. A key challenge is defining a policy language that is optimizer-invariant yet efficient to enforce at scale. We formalize data safety as aggregate predicates over provenance monomials and present Passant, a portable query rewriting layer that enforces DFC policies without materializing provenance. Across five DBMS engines -- DuckDB, Umbra, PostgreSQL, DataFusion, and SQLServer -- Passant achieves ~0% overhead and outperforms alternatives by orders of magnitude. As a result, Data Flow Control is the first step towards moving data safety from prompts and post-hoc checks into the data infrastructure. Data Flow Control is available open source at https://github.com/dataflowcontrol/data-flow-control.
Problem

Research questions and friction points this paper is trying to address.

Data Safety
AI Agents
Policy Enforcement
Data Flow Control
Regulatory Compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data Flow Control
Provenance
Query Rewriting
Data Safety
Policy Enforcement