π€ AI Summary
This study addresses text-to-SQL translation under resource-constrained settings, systematically evaluating three lightweight Transformer modelsβT5-Small, BART-Small, and GPT-2βon the Spider benchmark under both zero-shot and fine-tuned paradigms. We propose a model-agnostic lightweight processing pipeline, comprising a unified database schema formatting method and a schema-aware generation strategy tailored for encoder-decoder architectures. Performance is comprehensively assessed using logical form accuracy (LF Acc), BLEU, and exact match metrics. Experimental results show that fine-tuned T5-Small achieves 27.8% LF Acc, substantially outperforming BART-Small and GPT-2. The findings demonstrate the feasibility of deploying efficient, reusable NL2SQL systems using lightweight encoder-decoder models in low-resource environments. This work provides a practical technical pathway for NL2SQL deployment in education and business intelligence applications where computational efficiency and minimal infrastructure overhead are critical.
π Abstract
Text-to-SQL translation enables non-expert users to query relational databases using natural language, with applications in education and business intelligence. This study evaluates three lightweight transformer models - T5-Small, BART-Small, and GPT-2 - on the Spider dataset, focusing on low-resource settings. We developed a reusable, model-agnostic pipeline that tailors schema formatting to each model's architecture, training them across 1000 to 5000 iterations and evaluating on 1000 test samples using Logical Form Accuracy (LFAcc), BLEU, and Exact Match (EM) metrics. Fine-tuned T5-Small achieves the highest LFAcc (27.8%), outperforming BART-Small (23.98%) and GPT-2 (20.1%), highlighting encoder-decoder models' superiority in schema-aware SQL generation. Despite resource constraints limiting performance, our pipeline's modularity supports future enhancements, such as advanced schema linking or alternative base models. This work underscores the potential of compact transformers for accessible text-to-SQL solutions in resource-scarce environments.