π€ AI Summary
Zero-shot Text-to-SQL suffers from significant limitations in performance and cross-domain generalization due to the absence of generation constraints. This work addresses this challenge by distilling universal generation rules from failure cases for the first time, introducing a Map-Reduceβbased rule distillation framework. The approach integrates knowledge-enhanced schema representation, rule-driven structured reasoning, and execution-guided early stopping to enable high-quality SQL generation without any in-context examples. Evaluated on the Spider benchmark, the method achieves execution accuracies of 87.2% on the development set and 88.6% on the test set, establishing a new state-of-the-art in zero-shot settings, and further attains 81.3% on UrbanPlan. Notably, it outperforms zero-shot baselines of leading closed-source models even when deployed on a 4B-parameter language model.
π Abstract
Text-to-SQL translates natural language into executable SQL queries. Few-shot in-context learning methods built upon large language models (LLMs) achieve strong performance, yet their reliance on demonstrations limits cross-domain generalization and consumes substantial context window space. Existing zero-shot methods, lacking effective generation constraints, still fall short of few-shot approaches. We observe that LLM failures in zero-shot Text-to-SQL are not random but exhibit systematic, recurring patterns. Building on this observation, we propose a fully zero-shot Text-to-SQL framework that distills core generation rules from failure cases through a Map-Reduce-based rule distillation pipeline and improves generation quality via three complementary modules: knowledge-augmented schema representation, which supplements missing semantics in Data Definition Language; a rule-driven structured reasoning framework that suppresses structural deviations; and Execution-Guided Early Stopping, which enables low-cost self-correction. On Spider, the proposed framework achieves up to 87.2% and 88.6% execution accuracy on the Dev and Test sets, respectively, establishing a new zero-shot state-of-the-art and surpassing multiple few-shot and fine-tuning methods built upon GPT-4/4o. On the domain-specific dataset UrbanPlan, it achieves 81.3%, confirming that the rule distillation approach generalizes across domains. Moreover, when equipped with a 4B-parameter model, the framework surpasses zero-shot baselines of leading closed-source models, demonstrating strong model generality.