🤖 AI Summary
This work addresses the lack of benchmark resources and effective methods for cross-domain, context-dependent Text-to-SQL in Arabic—a low-resource language.
Method: We introduce Ar-SParC, the first multi-turn, cross-domain Arabic Text-to-SQL benchmark, and propose GAT Corrector, a graph attention network–based SQL result refiner. Leveraging GPT-3.5-turbo and GPT-4.5-turbo, we systematically evaluate four question representations and six in-context learning (ICL) prompting strategies under zero-shot and ICL settings.
Contribution/Results: Across 40 ablation experiments on Ar-SParC, GAT Corrector consistently improves execution accuracy and interaction accuracy by an average of 1.9% each, demonstrating its effectiveness and generalizability for Arabic NLIDB. This is the first systematic advancement of natural language interfaces to databases (NLIDBs) for Arabic, establishing both a new benchmark and a methodological paradigm for Text-to-SQL research in low-resource languages.
📝 Abstract
In recent years, the task of cross-domain, context-dependent text-to-SQL has received significant attention. Enables users with no prior knowledge of SQL to have a conversation with databases using natural language. However, most of the available datasets and research have been conducted in English, along with some work in Chinese. To this date, no effort has been made to address this task in the Arabic language. In this paper, we introduce Ar-SParC 1, the first Arabic cross-domain, context-dependent text-to-SQL dataset. The dataset consists of 3,450 sequences of interrelated questions, each sequence containing an average of approximately three questions, which results in a total of 10225 questions along with their corresponding SQL queries. We conducted 40 experiments on the Ar-SParC dataset using two large language models, GPT-3.5-turbo and GPT-4.5-turbo, applying 10 different prompt engineering techniques, including four question representation methods and six in-context learning techniques. Furthermore, we developed a novel approach named GAT corrector, which enhanced the performance across all 40 experiments, yielding an average improvement of 1.9% in execution accuracy (EX) and 1.9% in interaction accuracy (IX) under zero-shot settings, and an average increase of 1.72% EX and 0.92% IX under in-context learning settings. Finally, we conducted an ablation study with two more experiments to explain why the GAT corrector outperformed the previous GAT verifier technique, particularly for the Arabic language.