Synthesizing JSON Schema Transformers

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

143K/year
🤖 AI Summary
To address the error-prone and inefficient manual rewriting of data transformation logic upon JSON Schema evolution, this paper proposes a type-directed, top-down program synthesis approach for automatically generating semantics-preserving JSON Schema converters. Our method integrates type inference, semantic constraint modeling, a rewrite system, and intermediate representation (IR)-driven code generation to guarantee lossless data transformation and formal verifiability. It natively supports complex nested schemas and synthesizes correct, efficient, and human-readable Python and JavaScript conversion code. We evaluate our approach on real-world API configuration schemas and healthcare data integration scenarios, demonstrating its safety—via formal guarantees and empirical validation—its practical utility in industrial settings, and its generalizability across diverse schema evolution patterns. Experimental results confirm high accuracy, robustness to structural changes (e.g., field additions, type refinements, nested object restructuring), and scalability to large, deeply nested schemas.

Technology Category

Application Category

📝 Abstract
JSON (JavaScript Object Notation) is a data encoding that allows structured data to be used in a standardized and straightforward manner across systems. Schemas for JSON-formatted data can be constructed using the JSON Schema standard, which describes the data types, structure, and meaning of JSON-formatted data. JSON is commonly used for storing and transmitting information such as program configurations, web API requests and responses, or remote procedure calls; or data records, such as healthcare information or other structured documents. Since JSON is a plaintext format with potentially highly complex definitions, it can be an arduous process to change code which handles structured JSON data when its storage or transmission schemas are modified. Our work describes a program synthesis method to generate a program that accepts data conforming to a given input JSON Schema and automatically converts it to conform to a resulting, target JSON Schema. We use a top-down, type-directed approach to search for programs using a set of rewrite rules which constrain the ways in which a schema can be modified without unintended data loss or corruption. Once a satisfying sequence of rewrites has been found, we pass an intermediate representation of the rewrite sequence to a code generation backend, which synthesizes a program which executes the data transformation. This system allows users to quickly and efficiently modify or augment their existing systems in safe ways at their interfaces.
Problem

Research questions and friction points this paper is trying to address.

Automating transformation between different JSON Schema versions
Generating programs to convert JSON data between schemas
Preventing data loss during JSON Schema evolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Program synthesis for JSON Schema transformation
Top-down type-directed rewrite rule search
Code generation backend for data transformation
🔎 Similar Papers