SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL

πŸ“… 2026-01-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes SQL-Trail, the first multi-turn interactive reinforcement learning framework for Text-to-SQL that emulates human experts’ iterative reasoning and error correction capabilities. Unlike conventional single-pass generation approaches, SQL-Trail interacts with the database environment and leverages execution feedback to iteratively refine its SQL outputs. The framework incorporates an adaptive turn-budget allocation mechanism and a composite reward design, dynamically adjusting interaction depth according to query complexity. Evaluated across multiple benchmarks, SQL-Trail achieves new state-of-the-art performance, demonstrating up to an 18-fold improvement in data efficiency over prior single-turn reinforcement learning methods. Notably, models of 7B and 14B parameters consistently outperform larger closed-source systems by an average of 5% in accuracy.

Technology Category

Application Category

πŸ“ Abstract
While large language models (LLMs) have substantially improved Text-to-SQL generation, a pronounced gap remains between AI systems and human experts on challenging benchmarks such as BIRD-SQL. We argue this gap stems largely from the prevailing single-pass paradigm, which lacks the iterative reasoning, schema exploration, and error-correction behaviors that humans naturally employ. To address this limitation, we introduce SQL-Trail, a multi-turn reinforcement learning (RL) agentic framework for Text-to-SQL. Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions. Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizes SQL correctness and efficient exploration. Across benchmarks, SQL-Trail sets a new state of the art and delivers strong data efficiency--up to 18x higher than prior single-pass RL state-of-the-art methods. Notably, our 7B and 14B models outperform substantially larger proprietary systems by 5% on average, underscoring the effectiveness of interactive, agentic workflows for robust Text-to-SQL generation.
Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL
iterative reasoning
schema exploration
error correction
multi-turn interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-turn reinforcement learning
interleaved feedback
adaptive turn-budget allocation
composite reward
agentic Text-to-SQL
πŸ”Ž Similar Papers
No similar papers found.