SPA: A SQL-Plan-Aware Reinforcement Learning Framework for Query Rewriting with LLMs

πŸ“… 2026-06-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing SQL rewriting approaches struggle to substantially improve the performance of modern analytical queries while preserving semantic correctness. This work formulates SQL rewriting as a policy optimization problem and introduces a GRPO-based reinforcement learning framework that integrates multi-dimensional reward signals, including semantic equivalence, textual similarity, physical plan divergence, and runtime speedup. The approach innovatively incorporates a probabilistic gating mechanism for adaptive reward shaping and a curriculum learning–driven hierarchical reward unlocking strategy, complemented by an intra-policy self-improvement mechanism to enhance both sample efficiency and rewrite quality. Experimental results demonstrate that the proposed method significantly outperforms rule-based and large language model (LLM) baselines on both in-distribution and out-of-distribution workloads, markedly reducing performance-degrading rewrites and effectively mitigating tail latency.
πŸ“ Abstract
SQL query rewriting is a well-established technique for improving database performance without schema or index changes, yet finding effective rewrites for modern analytical workloads remains difficult: rule-based methods are limited to predefined transformations, while LLM-based approaches often produce rewrites that are semantically valid but compile to equivalent physical plans or degrade runtime performance. We present SPA, a SQL-Plan-Aware reinforcement learning framework that trains LLMs to rewrite queries using physical execution feedback. SPA formulates rewriting as a policy optimization problem and extends GRPO with rewards spanning semantic equivalence, textual rewrite distance, physical-plan divergence, and runtime speedup. To handle reward sparsity across query difficulty, SPA introduces Probability-Gated Adaptive Reward Shaping, a query-level curriculum that unlocks higher-level rewards only once a rollout group achieves sufficient mastery of lower-level objectives, and further improves sample efficiency through on-policy self-improvement by recycling slowdown rewrites from the current policy as targeted training signals. On both IID and OOD workloads, SPA outperforms rule-based and strong LLM baselines in end-to-end runtime, substantially reduces harmful slowdown rewrites, and yields strong tail-latency gains.
Problem

Research questions and friction points this paper is trying to address.

SQL query rewriting
LLM-based optimization
physical execution plan
runtime performance degradation
semantic equivalence
Innovation

Methods, ideas, or system contributions that make the work stand out.

SQL query rewriting
reinforcement learning
physical execution plan
reward shaping
large language models
πŸ”Ž Similar Papers