π€ AI Summary
Data engineers widely employ embedded domain-specific languages (DSLs) in Python to generate data pipelines; however, their dynamic nature defers type errors to runtime and impedes precise error localization. To address this debugging challenge, we propose a progressive metaprogramming paradigm that enables smooth migration from dynamic to statically typed DSLs, facilitating early type checking during code generation and exact source-level error attribution. We design MetaGTLC, a metaprogramming calculus integrating progressive type checking with incremental runtime validation, and implement its semantics via the Cast-based calculus MetaCC. We formally verify MetaGTLCβs safety in Agda, proving that successful meta-execution guarantees generation of well-typed target programs. Our approach significantly improves the reliability and debuggability of DSL-based code generation.
π Abstract
Data engineers increasingly use domain-specific languages (DSLs) to generate the code for data pipelines. Such DSLs are often embedded in Python. Unfortunately, there are challenges in debugging the generation of data pipelines: an error in a Python DSL script is often detected too late, after the execution of the script, and the source code location that triggers the error is hard to pinpoint. In this paper, we focus on the F3 DSL of Meta (Facebook), which is a DSL embedded in Python (so it is dynamically-typed) to generate data pipeline description code that is statically-typed. We propose gradual metaprogramming to (1) provide a migration path toward statically typed DSLs, (2) immediately provide earlier detection of code generation type errors, and (3) report the source code location responsible for the type error. Gradual metaprogramming accomplishes this by type checking code fragments and incrementally performing runtime checks as they are spliced together. We define MetaGTLC, a metaprogramming calculus in which a gradually-typed metalanguage manipulates a statically-typed object language, and give semantics to it by translation to the cast calculus MetaCC. We prove that successful metaevaluation always generates a well-typed object program and mechanize the proof in Agda.