Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

πŸ“… 2025-04-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitation of large language models (LLMs) in Text-to-SQL tasks: while capable of generating SQL queries, they lack transferable, table-level semantic reasoning ability. To remedy this, we propose a reasoning-centric paradigm. Methodologically, we design a two-stage framework: (1) synthesizing fine-grained, chain-of-thought reasoning traces from real SQL queries; and (2) applying Group Relative Policy Optimization (GRPO)β€”a novel reinforcement learning objectiveβ€”to train models to acquire dataset-agnostic, semantic-level reasoning steps. Integrated with Chain-of-Thought distillation, SQL supervised learning, and model quantization, our approach significantly improves generalization and interpretability on reasoning-intensive benchmarks (BIRD, CRT-QA): distilled and quantized LLaMA-3B achieves +20% accuracy, Qwen-1.5B +5%. Our core contributions are: (i) reframing Text-to-SQL as a table-level reasoning capability acquisition task; and (ii) the first application of GRPO to structured reasoning alignment.

Technology Category

Application Category

πŸ“ Abstract
This work reframes the Text-to-SQL task as a pathway for teaching large language models (LLMs) to reason over and manipulate tabular data--moving beyond the traditional focus on query generation. We propose a two-stage framework that leverages SQL supervision to develop transferable table reasoning capabilities. First, we synthesize detailed chain-of-thought (CoT) traces from real-world SQL queries, providing step-by-step, clause-level supervision that teaches the model how to traverse, filter, and aggregate table fields. Second, we introduce a Group Relative Policy Optimization (GRPO) reinforcement learning objective that connects SQL execution accuracy to generalizable reasoning by encouraging steps that extend beyond task-specific syntax and transfer across datasets. Empirically, our approach improves performance on standard Text-to-SQL benchmarks and achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA, demonstrating enhanced generalization and interpretability. Specifically, the distilled-quantized LLaMA model achieved a 20% increase in accuracy when trained on Text-to-SQL tasks, while Qwen achieved a 5% increase. These results suggest that SQL can serve not only as a target formalism but also as an effective scaffold for learning robust, transferable reasoning over structured data.
Problem

Research questions and friction points this paper is trying to address.

Teaching LLMs to reason over tabular data via Text-to-SQL
Developing transferable table reasoning capabilities using SQL supervision
Improving generalization and interpretability in structured data reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes detailed CoT traces from SQL queries
Uses GRPO for SQL execution accuracy reinforcement
Improves Text-to-SQL benchmarks and reasoning datasets
πŸ”Ž Similar Papers