ZeroCard: Cardinality Estimation with Zero Dependence on Target Databases -- No Data, No Query, No Retraining

πŸ“… 2025-10-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing learning-based cardinality estimation methods heavily rely on target database data distributions or query logs, exhibiting poor generalization and limited cross-database deployability. Method: We propose ZeroCardβ€”the first fully zero-dependency cardinality estimator that requires no access to raw data, query samples, or retraining on the target database; instead, it leverages only schema-level semantic information for cross-database transfer. Its core innovation is the first application of schema semantics modeling to cardinality estimation, featuring a query-agnostic schema encoder and a large-scale pre-trained distribution prediction model trained on real-world tables. Results: Experiments across multiple real datasets and query workloads demonstrate that ZeroCard achieves high accuracy (mean Q-error < 2.5), significantly outperforming state-of-the-art zero-shot and few-shot approaches. It enables plug-and-play deployment, eliminating costly adaptation overhead and substantially improving practical usability.

Technology Category

Application Category

πŸ“ Abstract
Cardinality estimation is a fundamental task in database systems and plays a critical role in query optimization. Despite significant advances in learning-based cardinality estimation methods, most existing approaches remain difficult to generalize to new datasets due to their strong dependence on raw data or queries, thus limiting their practicality in real scenarios. To overcome these challenges, we argue that semantics in the schema may benefit cardinality estimation, and leveraging such semantics may alleviate these dependencies. To this end, we introduce ZeroCard, the first semantics-driven cardinality estimation method that can be applied without any dependence on raw data access, query logs, or retraining on the target database. Specifically, we propose to predict data distributions using schema semantics, thereby avoiding raw data dependence. Then, we introduce a query template-agnostic representation method to alleviate query dependence. Finally, we construct a large-scale query dataset derived from real-world tables and pretrain ZeroCard on it, enabling it to learn cardinality from schema semantics and predicate representations. After pretraining, ZeroCard's parameters can be frozen and applied in an off-the-shelf manner. We conduct extensive experiments to demonstrate the distinct advantages of ZeroCard and show its practical applications in query optimization. Its zero-dependence property significantly facilitates deployment in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Estimating cardinality without target database dependencies
Overcoming generalization limitations in learning-based methods
Leveraging schema semantics to replace data and query reliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates cardinality using schema semantics without data
Uses query-agnostic representation to avoid query dependence
Pretrains model on large dataset for zero retraining
πŸ”Ž Similar Papers
No similar papers found.
X
Xianghong Xu
ByteDance, Beijing, China
R
Rong Kang
ByteDance, Beijing, China
X
Xiao He
ByteDance, Hangzhou, China
L
Lei Zhang
ByteDance, San Jose, USA
J
Jianjun Chen
ByteDance, San Jose, USA
Tieying Zhang
Tieying Zhang
Research Scientist at Bytedance
AI for SystemsSystems for AI