🤖 AI Summary
This work addresses limitations in existing NL2SQL approaches, which rely solely on correct examples for in-context learning and lack structured test-time scaling strategies, leading to insufficient reasoning diversity and a poor trade-off between efficiency and accuracy. To overcome these issues, the authors propose a training-free NL2SQL framework that leverages three structured problem decomposition strategies—entity-level, hierarchical, and atomic sequence decomposition—combined with a retrieval-augmented self-correction mechanism powered by a dynamic memory bank. This approach effectively exploits both successful queries and error-correction pairs to enhance generation quality under a zero-shot setting. Evaluated on the BIRD dataset, the method achieves a 68.5% execution accuracy, establishing a new state of the art among open-source, non-fine-tuned approaches while reducing computational costs by over an order of magnitude compared to prior test-time scaling methods.
📝 Abstract
Existing NL2SQL systems face two critical limitations: (1) they rely on in-context learning with only correct examples, overlooking the rich signal in historical error-fix pairs that could guide more robust self-correction; and (2) test-time scaling approaches often decompose questions arbitrarily, producing near-identical SQL candidates across runs and diminishing ensemble gains. Moreover, these methods suffer from a stark accuracy-efficiency trade-off: high performance demands excessive computation, while fast variants compromise quality. We present Memo-SQL, a training-free framework that addresses these issues through two simple ideas: structured decomposition and experience-aware self-correction. Instead of leaving decomposition to chance, we apply three clear strategies, entity-wise, hierarchical, and atomic sequential, to encourage diverse reasoning. For correction, we build a dynamic memory of both successful queries and historical error-fix pairs, and use retrieval-augmented prompting to bring relevant examples into context at inference time, no fine-tuning or external APIs required. On BIRD, Memo-SQL achieves 68.5% execution accuracy, setting a new state of the art among open, zero-fine-tuning methods, while using over 10 times fewer resources than prior TTS approaches.