Algebraic data integration*

📅 2015-03-12
🏛️ Journal of functional programming
📈 Citations: 27
Influential: 2
📄 PDF
🤖 AI Summary
This paper addresses the challenge of data integration across heterogeneous databases. It proposes an algebraic approach grounded in category theory and functional programming. The method models database schemas and instances as many-sorted equational theories and their initial algebras, respectively, and employs adjoint functors to enable rigorous cross-schema data migration. Innovatively, it unifies category theory, many-sorted equational logic, and functional programming paradigms; introduces a pushout-based schema mapping construction; and defines an algebraic query language—with for/where/return syntax—endowed with formal semantics. The authors implement AQL, an open-source tool supporting formally specified schema mappings, automated data migration, and verifiable query compilation. This framework constitutes the first theoretically rigorous integration of these three foundational paradigms, simultaneously ensuring mathematical precision and enhancing the automation and reliability of data integration.
📝 Abstract
Abstract In this paper, we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, S–Inst, and a morphism of schemas F : S → T induces three adjoint data migration functors: Σ F : S–Inst → T–Inst, defined by substitution along F, which has a right adjoint Δ F : T–Inst → S–Inst, which in turn has a right adjoint Π F : S–Inst → T–Inst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the implementation of our formalism in a tool we call AQL (Algebraic Query Language).
Problem

Research questions and friction points this paper is trying to address.

Develops algebraic approach to data integration
Combines functional programming, category, and database theory
Introduces query language and tool (CQL) for implementation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Algebraic approach combining functional programming and category theory
Data migration functors for schema transformation
Query language based on for/where/return syntax
🔎 Similar Papers
No similar papers found.
P
Patrick Schultz
Department of Mathematics, Massachusetts Institute of Technology
Ryan Wisnesky
Ryan Wisnesky
Conexus AI
Computer Science