SPA: A SQL-Plan-Aware Reinforcement Learning Framework for Query Rewriting with LLMs

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing SQL rewriting approaches struggle to substantially improve the performance of modern analytical queries while preserving semantic correctness. This work formulates SQL rewriting as a policy optimization problem and introduces a GRPO-based reinforcement learning framework that integrates multi-dimensional reward signals, including semantic equivalence, textual similarity, physical plan divergence, and runtime speedup. The approach innovatively incorporates a probabilistic gating mechanism for adaptive reward shaping and a curriculum learning–driven hierarchical reward unlocking strategy, complemented by an intra-policy self-improvement mechanism to enhance both sample efficiency and rewrite quality. Experimental results demonstrate that the proposed method significantly outperforms rule-based and large language model (LLM) baselines on both in-distribution and out-of-distribution workloads, markedly reducing performance-degrading rewrites and effectively mitigating tail latency.

📝 Abstract

SQL query rewriting is a well-established technique for improving database performance without schema or index changes, yet finding effective rewrites for modern analytical workloads remains difficult: rule-based methods are limited to predefined transformations, while LLM-based approaches often produce rewrites that are semantically valid but compile to equivalent physical plans or degrade runtime performance. We present SPA, a SQL-Plan-Aware reinforcement learning framework that trains LLMs to rewrite queries using physical execution feedback. SPA formulates rewriting as a policy optimization problem and extends GRPO with rewards spanning semantic equivalence, textual rewrite distance, physical-plan divergence, and runtime speedup. To handle reward sparsity across query difficulty, SPA introduces Probability-Gated Adaptive Reward Shaping, a query-level curriculum that unlocks higher-level rewards only once a rollout group achieves sufficient mastery of lower-level objectives, and further improves sample efficiency through on-policy self-improvement by recycling slowdown rewrites from the current policy as targeted training signals. On both IID and OOD workloads, SPA outperforms rule-based and strong LLM baselines in end-to-end runtime, substantially reduces harmful slowdown rewrites, and yields strong tail-latency gains.

Problem

Research questions and friction points this paper is trying to address.

SQL query rewriting

LLM-based optimization

physical execution plan

runtime performance degradation

semantic equivalence

Innovation

Methods, ideas, or system contributions that make the work stand out.

SQL query rewriting

reinforcement learning

physical execution plan