SEFRQO: A Self-Evolving Fine-Tuned RAG-Based Query Optimizer

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the cold-start problem and poor adaptability to workload/schema changes in traditional learned query optimizers (LQOs), this paper proposes a retrieval-augmented generation (RAG)-based self-evolving query optimization framework. Our method integrates large language models’ (LLMs) in-context learning capability with real execution feedback via a dynamic prompting mechanism: RAG retrieves semantically similar queries and historical execution records to construct context-aware prompts, while supervised and reinforcement fine-tuning jointly train the LLM to generate high-quality optimization directives—enabling continuous evolution without frequent retraining. Evaluated on the CEB and Stack benchmarks, our approach reduces query latency by 65.05% and 93.57%, respectively, compared to PostgreSQL, significantly outperforming state-of-the-art learned optimizers. The key contribution lies in the first integration of contextual LLM reasoning with empirical execution signals within a RAG-driven, self-updating optimization pipeline.

Technology Category

Application Category

📝 Abstract

Query optimization is a crucial problem in database systems that has been studied for decades. Learned query optimizers (LQOs) can improve performance over time by incorporating feedback; however, they suffer from cold-start issues and often require retraining when workloads shift or schemas change. Recent LLM-based query optimizers leverage pre-trained and fine-tuned LLMs to mitigate these challenges. Nevertheless, they neglect LLMs' in-context learning and execution records as feedback for continuous evolution. In this paper, we present SEFRQO, a Self-Evolving Fine-tuned RAG-based Query Optimizer. SEFRQO mitigates the cold-start problem of LQOs by continuously learning from execution feedback via a Retrieval-Augmented Generation (RAG) framework. We employ both supervised fine-tuning and reinforcement fine-tuning to prepare the LLM to produce syntactically correct and performance-efficient query hints. Moreover, SEFRQO leverages the LLM's in-context learning capabilities by dynamically constructing prompts with references to similar queries and the historical execution record of the same query. This self-evolving paradigm iteratively optimizes the prompt to minimize query execution latency. Evaluations show that SEFRQO outperforms state-of-the-art LQOs, achieving up to 65.05% and 93.57% reductions in query latency on the CEB and Stack workloads, respectively, compared to PostgreSQL.

Problem

Research questions and friction points this paper is trying to address.

Addresses cold-start issues in learned query optimizers

Mitigates workload shift and schema change retraining needs

Leverages execution feedback for continuous optimizer evolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving RAG framework for continuous learning

Combines supervised and reinforcement fine-tuning for optimization

Dynamic in-context learning with historical execution references

🔎 Similar Papers

No similar papers found.

Authors to Follow