Automatic Database Configuration Debugging using Retrieval-Augmented Language Models

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Database configuration tuning remains a critical challenge for DBAs, as existing approaches struggle to simultaneously achieve high diagnostic accuracy and actionable remediation recommendations. This paper proposes the first Retrieval-Augmented Generation (RAG) framework specifically designed for DBMS configuration debugging. It integrates heterogeneous knowledge sources—including historical support tickets, official documentation, and real-time telemetry—enabling natural-language-based interactive diagnosis. By combining domain-specific retrieval over heterogeneous documents with fine-tuned large language model (LLM) reasoning, the framework achieves end-to-end problem localization and executable fix generation. Evaluations on a real-world DBMS configuration debugging dataset demonstrate substantial improvements over state-of-the-art baselines: diagnostic accuracy increases by 23.6%, and recommendation adoption rate reaches 89.4%. To our knowledge, this is the first approach to deliver automated configuration debugging that is simultaneously accurate, interpretable, and operationally executable.

Technology Category

Application Category

📝 Abstract
Database management system (DBMS) configuration debugging, e.g., diagnosing poorly configured DBMS knobs and generating troubleshooting recommendations, is crucial in optimizing DBMS performance. However, the configuration debugging process is tedious and, sometimes challenging, even for seasoned database administrators (DBAs) with sufficient experience in DBMS configurations and good understandings of the DBMS internals (e.g., MySQL or Oracle). To address this difficulty, we propose Andromeda, a framework that utilizes large language models (LLMs) to enable automatic DBMS configuration debugging. Andromeda serves as a natural surrogate of DBAs to answer a wide range of natural language (NL) questions on DBMS configuration issues, and to generate diagnostic suggestions to fix these issues. Nevertheless, directly prompting LLMs with these professional questions may result in overly generic and often unsatisfying answers. To this end, we propose a retrieval-augmented generation (RAG) strategy that effectively provides matched domain-specific contexts for the question from multiple sources. They come from related historical questions, troubleshooting manuals and DBMS telemetries, which significantly improve the performance of configuration debugging. To support the RAG strategy, we develop a document retrieval mechanism addressing heterogeneous documents and design an effective method for telemetry analysis. Extensive experiments on real-world DBMS configuration debugging datasets show that Andromeda significantly outperforms existing solutions.
Problem

Research questions and friction points this paper is trying to address.

database optimization
automatic detection
settings correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Andromeda
RAG Strategy
Advanced Super Language Model