CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation

πŸ“… 2025-07-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address insufficient SQL correctness and executability in complex Text-to-SQL tasks, this paper proposes a lightweight, weakly supervised reinforcement learning framework. The method eliminates reliance on intermediate representation supervision and hand-crafted reward shaping, instead leveraging only sparse feedback signalsβ€”SQL execution outcomes (correctness) and syntactic validity (compliance)β€”to ensure task alignment and training stability. Integrating a 7B-parameter language model, execution-guided policy optimization, format-tag constraints, and a newly curated diverse weakly supervised reasoning dataset, our approach achieves state-of-the-art execution accuracy on the BIRD benchmark, substantially outperforming comparably sized fine-tuned models. Training requires only four A100 GPUs, and we publicly release the high-quality reasoning dataset to advance low-resource, robust Text-to-SQL research.

Technology Category

Application Category

πŸ“ Abstract
Translating natural language into SQL (Text-to-SQL) remains a core challenge at the intersection of language understanding and structured data access. Although large language models (LLMs) have improved fluency, generating correct and executable SQL, especially for complex queries, continues to be challenging. We introduce CogniSQL-R1-Zero, a reinforcement learning (RL) framework and model that produces accurate SQL using a lightweight reward signal based on execution correctness and format-tag compliance. By avoiding intermediate supervision, hybrid pipelines and complex reward shaping, our method encourages stable learning and stronger alignment with the ultimate task objective-producing executable programs. CogniSQL-R1-Zero achieves state-of-the-art execution accuracy on Text2SQL benchmark; BIRD bench, outperforming prior supervised and instruction-tuned baselines including SFT CodeS-7B, DeepSeek-Coder 236B, and Mistral 123B-despite being trained on a significantly smaller 7B backbone. This result underscores the scalability and efficiency of our RL-based approach when trained on just four NVIDIA A100 GPUs (40 GB VRAM each). To support further research in efficient and interpretable Text-to-SQL modeling, we release two curated datasets: (i) a collection of 5,024 reasoning traces with varying context lengths, and (ii) a positive-sampled corpus of 36,356 corpus of weakly supervised queries, each annotated with six semantically diverse reasoning paths. Together, these contributions advance scalable, execution-aligned Text-to-SQL generation.
Problem

Research questions and friction points this paper is trying to address.

Improving SQL generation from natural language queries
Enhancing accuracy for complex SQL queries execution
Reducing resource usage with lightweight reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight RL framework for SQL generation
Execution correctness as reward signal
State-of-the-art accuracy on Text2SQL benchmarks
πŸ”Ž Similar Papers
No similar papers found.