π€ AI Summary
This work addresses the limitation of large language models (LLMs) in Text-to-SQL tasks: while capable of generating SQL queries, they lack transferable, table-level semantic reasoning ability. To remedy this, we propose a reasoning-centric paradigm. Methodologically, we design a two-stage framework: (1) synthesizing fine-grained, chain-of-thought reasoning traces from real SQL queries; and (2) applying Group Relative Policy Optimization (GRPO)βa novel reinforcement learning objectiveβto train models to acquire dataset-agnostic, semantic-level reasoning steps. Integrated with Chain-of-Thought distillation, SQL supervised learning, and model quantization, our approach significantly improves generalization and interpretability on reasoning-intensive benchmarks (BIRD, CRT-QA): distilled and quantized LLaMA-3B achieves +20% accuracy, Qwen-1.5B +5%. Our core contributions are: (i) reframing Text-to-SQL as a table-level reasoning capability acquisition task; and (ii) the first application of GRPO to structured reasoning alignment.
π Abstract
This work reframes the Text-to-SQL task as a pathway for teaching large language models (LLMs) to reason over and manipulate tabular data--moving beyond the traditional focus on query generation. We propose a two-stage framework that leverages SQL supervision to develop transferable table reasoning capabilities. First, we synthesize detailed chain-of-thought (CoT) traces from real-world SQL queries, providing step-by-step, clause-level supervision that teaches the model how to traverse, filter, and aggregate table fields. Second, we introduce a Group Relative Policy Optimization (GRPO) reinforcement learning objective that connects SQL execution accuracy to generalizable reasoning by encouraging steps that extend beyond task-specific syntax and transfer across datasets. Empirically, our approach improves performance on standard Text-to-SQL benchmarks and achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA, demonstrating enhanced generalization and interpretability. Specifically, the distilled-quantized LLaMA model achieved a 20% increase in accuracy when trained on Text-to-SQL tasks, while Qwen achieved a 5% increase. These results suggest that SQL can serve not only as a target formalism but also as an effective scaffold for learning robust, transferable reasoning over structured data.