🤖 AI Summary
This work addresses the limited coverage of lightweight large language models (LLMs) in generating test cases for database management system (DBMS)-specific SQL dialects, which often suffer from syntactic errors, semantic uniformity, and shallow path exploration. To overcome these limitations, we propose MIST, a novel framework that integrates Monte Carlo Tree Search (MCTS) into LLM-driven DBMS testing for the first time. MIST synergistically combines feature-guided, error-driven synthesis with coverage-oriented mutation strategies, leveraging a hierarchical feature tree and a joint feedback mechanism to jointly enhance the diversity and depth of generated test cases. Experimental results demonstrate that MIST improves average line, function, and branch coverage by 43.3%, 32.3%, and 46.4%, respectively, across three major DBMSes, with its Optimizer module achieving up to 69.3% line coverage.
📝 Abstract
Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reliability. Traditional approaches such as fuzzing can be effective for specific DBMSs, but adapting them to different proprietary dialects requires substantial manual effort. Large Language Models (LLMs) present promising opportunities for automated SQL test generation, but face critical challenges in industrial environments. First, lightweight models are widely used in organizations due to security and privacy constraints, but they struggle to generate syntactically valid queries for proprietary SQL dialects. Second, LLM-generated queries are often semantically similar and exercise only shallow execution paths, thereby quickly reaching a coverage plateau. To address these challenges, we propose MIST, an LLM-based test case generatIon framework for DBMS through Monte Carlo Tree search. MIST consists of two stages: Feature-Guided Error-Driven Test Case Synthetization, which constructs a hierarchical feature tree and uses error feedback to guide LLM generation, aiming to produce syntactically valid and semantically diverse queries for different DBMS dialects, and Monte Carlo Tree Search-Based Test Case Mutation, which jointly optimizes seed query selection and mutation rule application guided by coverage feedback, aiming at boosting code coverage by exploring deeper execution paths. Experiments on three widely-used DBMSs with four lightweight LLMs show that MIST achieves average improvements of 43.3% in line coverage, 32.3% in function coverage, and 46.4% in branch coverage compared to the baseline approach with the highest line coverage of 69.3% in the Optimizer module.