AI-Driven Research for Databases

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing database optimization approaches struggle to adapt to complex software-hardware environments, with manual design being inefficient and AI-driven methods hindered by evaluation bottlenecks. This work proposes a large language model–based automated optimization framework that jointly refines database algorithms and a lightweight evaluator through a co-evolutionary mechanism, enabling efficient feedback and code generation. For the first time in the database domain, this approach achieves deployable automated algorithm discovery, outperforming state-of-the-art solutions across three critical tasks: buffer management, query rewriting, and index selection. Notably, the derived deterministic query rewriting strategy reduces query latency by up to 6.8×.

Technology Category

Application Category

📝 Abstract

As the complexity of modern workloads and hardware increasingly outpaces human research and engineering capacity, existing methods for database performance optimization struggle to keep pace. To address this gap, a new class of techniques, termed AI-Driven Research for Systems (ADRS), uses large language models to automate solution discovery. This approach shifts optimization from manual system design to automated code generation. The key obstacle, however, in applying ADRS is the evaluation pipeline. Since these frameworks rapidly generate hundreds of candidates without human supervision, they depend on fast and accurate feedback from evaluators to converge on effective solutions. Building such evaluators is especially difficult for complex database systems. To enable the practical application of ADRS in this domain, we propose automating the design of evaluators by co-evolving them with the solutions. We demonstrate the effectiveness of this approach through three case studies optimizing buffer management, query rewriting, and index selection. Our automated evaluators enable the discovery of novel algorithms that outperform state-of-the-art baselines (e.g., a deterministic query rewrite policy that achieves up to 6.8x lower latency), demonstrating that addressing the evaluation bottleneck unlocks the potential of ADRS to generate highly optimized, deployable code for next-generation data systems.

Problem

Research questions and friction points this paper is trying to address.

AI-Driven Research

Database Optimization

Evaluation Bottleneck

Automated Evaluation

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-Driven Research for Systems

automated evaluator design

co-evolution