BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modeling implicit scientific reasoning—such as threshold-based judgments and effect-direction identification—in biomedical text-to-SQL remains challenging. Method: We introduce BMSQL, the first text-to-SQL benchmark explicitly designed for scientific reasoning in biomedicine. It covers realistic scenarios including gene–disease associations, multi-omics causal inference, and drug approval, comprising 68K high-quality triplet-based queries. We construct a BigQuery-integrated biomedical knowledge base and propose a BMSQL agent architecture supporting multi-step reasoning via chain-of-thought prompting and interactive execution verification. Contribution/Results: BMSQL establishes the first evaluation paradigm requiring domain-specific scientific reasoning. Experiments show it improves GPT-4o-mini’s execution accuracy from 59.0% to 62.6%, significantly outperforming existing baselines. The dataset and code are publicly released to advance interpretable, reasoning-aware biomedical AI query systems.

Technology Category

Application Category

📝 Abstract
Biomedical researchers increasingly rely on large-scale structured databases for complex analytical tasks. However, current text-to-SQL systems often struggle to map qualitative scientific questions into executable SQL, particularly when implicit domain reasoning is required. We introduce BiomedSQL, the first benchmark explicitly designed to evaluate scientific reasoning in text-to-SQL generation over a real-world biomedical knowledge base. BiomedSQL comprises 68,000 question/SQL query/answer triples grounded in a harmonized BigQuery knowledge base that integrates gene-disease associations, causal inference from omics data, and drug approval records. Each question requires models to infer domain-specific criteria, such as genome-wide significance thresholds, effect directionality, or trial phase filtering, rather than rely on syntactic translation alone. We evaluate a range of open- and closed-source LLMs across prompting strategies and interaction paradigms. Our results reveal a substantial performance gap: GPT-o3-mini achieves 59.0% execution accuracy, while our custom multi-step agent, BMSQL, reaches 62.6%, both well below the expert baseline of 90.0%. BiomedSQL provides a new foundation for advancing text-to-SQL systems capable of supporting scientific discovery through robust reasoning over structured biomedical knowledge bases. Our dataset is publicly available at https://huggingface.co/datasets/NIH-CARD/BiomedSQL, and our code is open-source at https://github.com/NIH-CARD/biomedsql.
Problem

Research questions and friction points this paper is trying to address.

Mapping qualitative biomedical questions to executable SQL queries
Handling implicit domain reasoning in text-to-SQL systems
Evaluating scientific reasoning over biomedical knowledge bases
Innovation

Methods, ideas, or system contributions that make the work stand out.

BiomedSQL benchmark for scientific text-to-SQL
Multi-step agent BMSQL improves execution accuracy
Integrates gene-disease associations and omics data
🔎 Similar Papers
No similar papers found.
M
Mathew J. Koretsky
1Center for Alzheimer’s Disease and Related Dementias, NIA, NIH;2DataTecnica LLC
M
Maya Willey
1Center for Alzheimer’s Disease and Related Dementias, NIA, NIH;2DataTecnica LLC
A
Adi Asija
2DataTecnica LLC;3Johns Hopkins University
O
Owen Bianchi
1Center for Alzheimer’s Disease and Related Dementias, NIA, NIH;2DataTecnica LLC
C
Chelsea X. Alvarado
1Center for Alzheimer’s Disease and Related Dementias, NIA, NIH;2DataTecnica LLC
T
Tanay Nayak
2DataTecnica LLC;3Johns Hopkins University
N
Nicole Kuznetsov
1Center for Alzheimer’s Disease and Related Dementias, NIA, NIH;2DataTecnica LLC
S
Sungwon Kim
2DataTecnica LLC;3Johns Hopkins University
Mike A. Nalls
Mike A. Nalls
Founder/consultant with Data Tecnica International + Data science lead at NIH’s Center for Alzheimer
statistical geneticsneurodegenerationdata sciencebiostatisticsgenomics
Daniel Khashabi
Daniel Khashabi
Johns Hopkins University
Natural Language ProcessingArtificial IntelligenceMachine Learning
Faraz Faghri
Faraz Faghri
National Institutes of Health
Computer scienceNeuroscienceHealthAgingComplex diseases