Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This paper addresses the performance degradation in Text-to-SQL caused by inaccurate schema linking. We propose KaSLA, a hierarchical schema linking method based on constrained knapsack optimization, which decouples table-column linking into two-stage knapsack problems to maximize recall of relevant schema elements under a user-defined redundancy tolerance. We introduce the novel “constrained missing” metric to quantify linking quality and leverage LLM-derived embeddings to model semantic similarity, enabling plug-and-play integration into existing Text-to-SQL systems. Experiments show that KaSLA-1.6B outperforms state-of-the-art models—including DeepSeek-V3—on the Spider and BIRD benchmarks. Moreover, replacing the schema linking module in arbitrary Text-to-SQL systems with KaSLA yields significant improvements in SQL execution accuracy, demonstrating its generality and effectiveness.

Technology Category

Application Category

📝 Abstract

Generating SQLs from user queries is a long-standing challenge, where the accuracy of initial schema linking significantly impacts subsequent SQL generation performance. However, current schema linking models still struggle with missing relevant schema elements or an excess of redundant ones. A crucial reason for this is that commonly used metrics, recall and precision, fail to capture relevant element missing and thus cannot reflect actual schema linking performance. Motivated by this, we propose an enhanced schema linking metric by introducing a restricted missing indicator. Accordingly, we introduce Knapsack optimization-based Schema Linking Agent (KaSLA), a plug-in schema linking agent designed to prevent the missing of relevant schema elements while minimizing the inclusion of redundant ones. KaSLA employs a hierarchical linking strategy that first identifies the optimal table linking and subsequently links columns within the selected table to reduce linking candidate space. In each linking process, it utilize a knapsack optimization approach to link potentially relevant elements while accounting for a limited tolerance of potential redundant ones.With this optimization, KaSLA-1.6B achieves superior schema linking results compared to large-scale LLMs, including deepseek-v3 with state-of-the-art (SOTA) schema linking method. Extensive experiments on Spider and BIRD benchmarks verify that KaSLA can significantly improve the SQL generation performance of SOTA text-to-SQL models by substituting their schema linking processes.

Problem

Research questions and friction points this paper is trying to address.

Improve schema linking accuracy

Minimize redundant schema elements

Enhance SQL generation performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knapsack optimization-based linking

Hierarchical table and column linking

Restricted missing indicator metric

🔎 Similar Papers

Structure Guided Large Language Model for SQL Generation