🤖 AI Summary
In NL2SQL, large language models (LLMs) suffer from two key bottlenecks: overly coarse-grained task decomposition and inaccurate identification of domain-specific keywords, leading to high SQL generation error rates. Moreover, existing benchmarks lack fine-grained task segmentation and explicit keyword annotations, hindering model interpretability and performance. To address these issues, we propose DeKeyNLU—a high-quality dataset featuring explicit hierarchical task decomposition and domain keyword labeling—and DeKeySQL, an end-to-end pipeline comprising three modules: question understanding, entity retrieval, and SQL generation. DeKeySQL integrates retrieval-augmented generation (RAG) with chain-of-thought (CoT) reasoning to enhance semantic grounding. Evaluated on BIRD and Spider, our approach achieves +6.79% and +4.5% absolute improvements in execution accuracy, respectively, effectively mitigating over-decomposition and keyword omission. This work establishes a more interpretable and scalable paradigm for semantic understanding in NL2SQL.
📝 Abstract
Natural Language to SQL (NL2SQL) provides a new model-centric paradigm that simplifies database access for non-technical users by converting natural language queries into SQL commands. Recent advancements, particularly those integrating Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) reasoning, have made significant strides in enhancing NL2SQL performance. However, challenges such as inaccurate task decomposition and keyword extraction by LLMs remain major bottlenecks, often leading to errors in SQL generation. While existing datasets aim to mitigate these issues by fine-tuning models, they struggle with over-fragmentation of tasks and lack of domain-specific keyword annotations, limiting their effectiveness. To address these limitations, we present DeKeyNLU, a novel dataset which contains 1,500 meticulously annotated QA pairs aimed at refining task decomposition and enhancing keyword extraction precision for the RAG pipeline. Fine-tuned with DeKeyNLU, we propose DeKeySQL, a RAG-based NL2SQL pipeline that employs three distinct modules for user question understanding, entity retrieval, and generation to improve SQL generation accuracy. We benchmarked multiple model configurations within DeKeySQL RAG pipeline. Experimental results demonstrate that fine-tuning with DeKeyNLU significantly improves SQL generation accuracy on both BIRD (62.31% to 69.10%) and Spider (84.2% to 88.7%) dev datasets.