SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address inefficient schema linking and inaccurate multi-hop JOIN reasoning in large-scale Text-to-SQL, this paper proposes a zero-shot, training-free, graph-path-driven schema linking paradigm. Methodologically, it constructs a foreign-key-constrained schema graph; leverages Gemini 2.5 Flash with lightweight prompts for zero-shot table node extraction; and automatically identifies optimal multi-hop JOIN paths via Dijkstra/A* search, followed by rule-based post-processing to generate executable JOIN sequences. The core contribution is the first zero-shot, fine-tuning-free approach enabling precise multi-hop linkage inference without complex LLM chaining. Evaluated on the BIRD benchmark, it achieves state-of-the-art execution accuracy—outperforming both fine-tuned models and multi-step LLM methods. Moreover, it scales linearly to数千-table schemas while reducing inference cost by over 60%.

Technology Category

Application Category

📝 Abstract

Text-to-SQL systems translate natural language questions into executable SQL queries, and recent progress with large language models (LLMs) has driven substantial improvements in this task. Schema linking remains a critical component in Text-to-SQL systems, reducing prompt size for models with narrow context windows and sharpening model focus even when the entire schema fits. We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations, then uses a single prompt to Gemini 2.5 Flash to extract source and destination tables from the user query, followed by applying classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined, enabling the LLM to generate more accurate SQL queries. Despite being simple, cost-effective, and highly scalable, our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches. We conduct detailed ablation studies to examine the precision-recall trade-off in our framework. Additionally, we evaluate the execution accuracy of our schema filtering method compared to other approaches across various model sizes.

Problem

Research questions and friction points this paper is trying to address.

Improves schema linking in Text-to-SQL using pathfinding algorithms

Reduces prompt size for LLMs with narrow context windows

Enhances SQL query accuracy on large-scale databases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses schema graph with foreign key relations

Applies pathfinding algorithms for optimal joins

Leverages Gemini 2.5 Flash for table extraction

🔎 Similar Papers

No similar papers found.

Authors to Follow