UniQL: Towards Dialect-Universal Benchmarking for Text-to-SQL

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing Text-to-SQL benchmarks are predominantly confined to SQLite, limiting their ability to evaluate models’ generalization across heterogeneous SQL dialects. To address this gap, this work proposes UniQL—the first large-scale benchmark encompassing 16 SQL dialects and featuring 24,544 human-verified, intent-aligned executable queries, with consistent natural language questions, database schemas, and data across all dialects. UniQL is constructed through a hybrid pipeline integrating database migration, SQL translation, execution-guided validation, iterative rule induction, and manual verification, enabling controlled cross-dialect evaluation. Experimental results reveal substantial performance disparities among current large language models across dialects, demonstrating that strong performance on SQLite does not generalize to other SQL variants and highlighting a critical deficiency in models’ dialect-agnostic capabilities.

📝 Abstract

Existing text-to-SQL benchmarks are largely centered on SQLite, making it difficult to evaluate whether models can generalize across heterogeneous SQL dialects. However, real-world database systems differ substantially in syntax, functions, type systems, and execution semantics, so the same natural language intent often requires dialect-specific SQL realizations. We introduce UniQL, a human-verified benchmark for cross-dialect text-to-SQL evaluation. UniQL aligns 1,534 natural language questions with executable SQL annotations across 16 SQL dialects, yielding 24,544 dialect-specific queries. All dialects share the same intents, aligned schemas and database contents, enabling controlled evaluation of dialect generalization. UniQL is constructed through a hybrid pipeline combining database migration, SQL translation, execution-guided verification, iterative rule summarization, and human validation. Experiments on both open-source and closed-source LLMs show that current models remain far from dialect-universal, with substantial performance variation across database systems and limited transfer from SQLite success to other dialects. These findings highlight the need for aligned cross-dialect benchmarks and more dialect-aware text-to-SQL methods. Code and data are available at https://github.com/JerryGao818/UniQL

Problem

Research questions and friction points this paper is trying to address.

text-to-SQL

SQL dialects

cross-dialect generalization

benchmarking

database systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-dialect

text-to-SQL

benchmark