SOMA-SQL: Resolving Multi-Source Ambiguity in NL-to-SQL via Synthetic Log and Execution Probing

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Natural language to SQL translation in real-world scenarios often suffers from multi-source ambiguities arising from vague user intent and complex database schemas, leading to semantic misalignment and generation failures. This work proposes a fully automated active disambiguation mechanism that first generates candidate SQL queries using synthetic query logs, then identifies conflict types through structured ambiguity categorization. It further designs execution-driven probing queries to automatically gather disambiguation evidence, enabling the selection or repair of the optimal SQL. Evaluated on six public benchmarks, the method achieves an average execution accuracy improvement of 13.0% over state-of-the-art models, with gains as high as 16.7% on highly ambiguous questions, substantially enhancing the system’s generalization capability in handling multi-source ambiguities.
πŸ“ Abstract
Natural language interfaces to databases aim to translate user questions into executable SQL, yet remain brittle in real-world settings where questions are underspecified and schemas are large and ambiguous. Ambiguity across user questions, database schemas, and model interpretations are central failure modes in NL2SQL, leading to misaligned intent, incorrect schema grounding, and erroneous SQL generation. Existing approaches rely on human clarification or treat ambiguity as a schema representation problem, but these do not scale nor resolve ambiguity autonomously. We propose SOMA-SQL to automatically resolve ambiguity via targeted synthetic query log and ambiguity-driven probing. SOMA-SQL constructs synthetic query log to ground schema interpretation and guide candidate SQL generation; it then executes targeted probing queries, driven by a structured ambiguity taxonomy and candidate disagreements, to produce disambiguation evidence for final SQL selection and repair. This active approach to ambiguity discovery and resolution generalizes across unseen schemas and query distributions without human-in-the-loop. Experiments on six public benchmarks demonstrate that SOMA-SQL improves execution accuracy by 13.0% on average over state-of-the-art baselines, with gains of up to 16.7% on ambiguous questions.
Problem

Research questions and friction points this paper is trying to address.

NL2SQL
ambiguity resolution
schema grounding
natural language interfaces
SQL generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

NL2SQL
ambiguity resolution
synthetic query log
execution probing
schema grounding