TailorSQL: An NL2SQL System Tailored to Your Query Workload

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing NL2SQL approaches rely solely on static database schemas and general-purpose large language models, failing to capture implicit semantic patterns prevalent in real-world scenarios—such as frequent join paths and field aliasing conventions—resulting in low translation accuracy and high latency. This work introduces the first load-aware NL2SQL framework that incorporates historical SQL query logs as prior knowledge. It integrates four key components: workload-informed schema enhancement, query log mining, lightweight fine-tuning, and retrieval-augmented generation (RAG) coordination, enabling database-specific personalization. Evaluated on standard benchmarks, our method achieves up to a 2× improvement in SQL generation accuracy, alongside significant reductions in parsing error rate and average latency. These results empirically validate the critical importance of query workload customization for enhancing the practicality and performance of NL2SQL systems.

Technology Category

Application Category

📝 Abstract
NL2SQL (natural language to SQL) translates natural language questions into SQL queries, thereby making structured data accessible to non-technical users, serving as the foundation for intelligent data applications. State-of-the-art NL2SQL techniques typically perform translation by retrieving database-specific information, such as the database schema, and invoking a pre-trained large language model (LLM) using the question and retrieved information to generate the SQL query. However, existing NL2SQL techniques miss a key opportunity which is present in real-world settings: NL2SQL is typically applied on existing databases which have already served many SQL queries in the past. The past query workload implicitly contains information which is helpful for accurate NL2SQL translation and is not apparent from the database schema alone, such as common join paths and the semantics of obscurely-named tables and columns. We introduce TailorSQL, a NL2SQL system that takes advantage of information in the past query workload to improve both the accuracy and latency of translating natural language questions into SQL. By specializing to a given workload, TailorSQL achieves up to 2$ imes$ improvement in execution accuracy on standardized benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Improves NL2SQL accuracy using past query workload insights
Specializes translation by leveraging common join paths and semantics
Enhances both accuracy and latency in SQL generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes past query workload for NL2SQL
Improves accuracy and latency with workload data
Specializes to workload for better performance
K
Kapil Vaidya
Parallel Web Systems
Jialin Ding
Jialin Ding
Assistant Professor of Computer Science, Princeton University
ML for SystemsData ManagementComputer Systems
S
Sebastian Kosak
Technical University of Munich
D
D. Kernert
STACKIT
Chuan Lei
Chuan Lei
Amazon Web Services
DatabaseData AnalyticsMachine Learning
X
Xiao Qin
Amazon Web Services
A
Abhinav Tripathy
Amazon Web Services
R
Ramesh Balan
Amazon Web Services
B
B. Narayanaswamy
Amazon Web Services
Tim Kraska
Tim Kraska
MIT
Systems for MLML for Systems