🤖 AI Summary
Existing query optimizers face three key challenges: explosive cost-model search space, high training overhead and low accuracy of value networks, and the absence of proactive mechanisms for identifying inefficient queries. This paper proposes a hybrid cost-driven two-stage optimization framework. First, a compatibility pre-checker—built upon Mahalanobis distance—filters out queries incompatible with learned models. Second, compatible queries undergo staged optimization: coarse-grained optimization employs a value-network-guided beam search, while fine-grained optimization leverages a lightweight learned cost model for precise plan ranking. Key innovations include the first-ever compatibility pre-checking mechanism, a synergistic two-stage paradigm, and a transfer-based joint training strategy that reuses data to reduce modeling costs. Evaluated on three standard benchmarks, our approach reduces average query latency by 57.3% over PostgreSQL and outperforms the current state-of-the-art learned optimizer by 54.6%, significantly improving both plan quality and robustness.
📝 Abstract
Query optimizer is a crucial module for database management systems. Existing optimizers exhibit two flawed paradigms: (1) cost-based optimizers use dynamic programming with cost models but face search space explosion and heuristic pruning constraints; (2) value-based ones train value networks to enable efficient beam search, but incur higher training costs and lower accuracy. They also lack mechanisms to detect queries where they may perform poorly. To determine more efficient plans, we propose Delta, a mixed cost-based query optimization framework that consists of a compatible query detector and a two-stage planner. Delta first employs a Mahalanobis distancebased detector to preemptively filter out incompatible queries where the planner might perform poorly. For compatible queries, Delta activates its two-stage mixed cost-based planner. Stage I serves as a coarse-grained filter to generate high-quality candidate plans based on the value network via beam search, relaxing precision requirements and narrowing the search space. Stage II employs a fine-grained ranker to determine the best plan from the candidate plans based on a learned cost model. Moreover, to reduce training costs, we reuse and augment the training data from stage I to train the model in stage II. Experimental results on three workloads demonstrate that Delta identifies higher-quality plans, achieving an average 2.34x speedup over PostgreSQL and outperforming the state-of-the-art learned methods by 2.21x.