Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

This work addresses the limitation of large language models (LLMs) in Text-to-SQL tasks: while capable of generating SQL queries, they lack transferable, table-level semantic reasoning ability. To remedy this, we propose a reasoning-centric paradigm. Methodologically, we design a two-stage framework: (1) synthesizing fine-grained, chain-of-thought reasoning traces from real SQL queries; and (2) applying Group Relative Policy Optimization (GRPO)—a novel reinforcement learning objective—to train models to acquire dataset-agnostic, semantic-level reasoning steps. Integrated with Chain-of-Thought distillation, SQL supervised learning, and model quantization, our approach significantly improves generalization and interpretability on reasoning-intensive benchmarks (BIRD, CRT-QA): distilled and quantized LLaMA-3B achieves +20% accuracy, Qwen-1.5B +5%. Our core contributions are: (i) reframing Text-to-SQL as a table-level reasoning capability acquisition task; and (ii) the first application of GRPO to structured reasoning alignment.

Technology Category

Application Category

📝 Abstract

This work reframes the Text-to-SQL task as a pathway for teaching large language models (LLMs) to reason over and manipulate tabular data--moving beyond the traditional focus on query generation. We propose a two-stage framework that leverages SQL supervision to develop transferable table reasoning capabilities. First, we synthesize detailed chain-of-thought (CoT) traces from real-world SQL queries, providing step-by-step, clause-level supervision that teaches the model how to traverse, filter, and aggregate table fields. Second, we introduce a Group Relative Policy Optimization (GRPO) reinforcement learning objective that connects SQL execution accuracy to generalizable reasoning by encouraging steps that extend beyond task-specific syntax and transfer across datasets. Empirically, our approach improves performance on standard Text-to-SQL benchmarks and achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA, demonstrating enhanced generalization and interpretability. Specifically, the distilled-quantized LLaMA model achieved a 20% increase in accuracy when trained on Text-to-SQL tasks, while Qwen achieved a 5% increase. These results suggest that SQL can serve not only as a target formalism but also as an effective scaffold for learning robust, transferable reasoning over structured data.

Problem

Research questions and friction points this paper is trying to address.

Teaching LLMs to reason over tabular data via Text-to-SQL

Developing transferable table reasoning capabilities using SQL supervision

Improving generalization and interpretability in structured data reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes detailed CoT traces from SQL queries

Uses GRPO for SQL execution accuracy reinforcement

Improves Text-to-SQL benchmarks and reasoning datasets

🔎 Similar Papers

H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables