Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This study addresses the critical need for localized natural language to SQL (NL2SQL) solutions in cloud-restricted biopharmaceutical manufacturing environments governed by GxP compliance. The authors present PharmaBatchDB AI, a platform that locally deploys open-source large language models—including Qwen 2.5 Coder 7B and Llama 3.1 8B—on consumer-grade hardware via Ollama, and systematically evaluates their performance on 60 domain-specific queries using a synthetically generated pharmaceutical database. This work provides the first empirical validation of local LLMs for regulatory-compliant data querying under GxP constraints, demonstrating that code-finetuned general-purpose models outperform domain-specialized alternatives. Notably, Llama 3.1 8B achieves the highest SQL compliance rate, while Qwen 2.5 Coder 7B excels in ROUGE-L scores, factual consistency, and hallucination control. These findings confirm the feasibility of localized NL2SQL systems in regulated settings, albeit with the necessity of human verification.
📝 Abstract
Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence systems. Locally deployed large language models (LLMs) offer a privacy-preserving alternative, but their suitability for pharmaceutical manufacturing tasks remains underexplored. This study evaluates four open-source LLMs (Qwen 2.5 Coder 7B, Llama 3.1 8B, Mistral 7B, and Meditron 7B) deployed locally via Ollama for natural-language-to-SQL generation over a pharmaceutical manufacturing database. A FastAPI-based evaluation platform, PharmaBatchDB AI, was developed using a synthetic Microsoft SQL Server database containing approximately 63,000 records across Batch, Manufacturing Execution System (MES), and Clean-In-Place (CIP) modules. Models were benchmarked on 60 domain-specific natural-language questions using metrics including SQL extraction rate, SQL compliance, factual consistency, ROUGE-L, hallucination rate, throughput, and latency. Qwen 2.5 Coder 7B, Llama 3.1 8B, and Mistral 7B generated SQL for all evaluation tasks, while Meditron 7B failed on nearly all tasks due to context-window limitations and poor SQL generation capability. Llama 3.1 8B achieved the highest SQL compliance, whereas Qwen 2.5 Coder 7B achieved the strongest overall text similarity and factual consistency. Performance differences between the two leading models were not statistically significant. The results show that code-tuned general-purpose LLMs outperform a domain-specific biomedical model on structured query generation for pharmaceutical manufacturing data. Although fully local, GxP-aligned NLQ systems are feasible on consumer hardware, current performance levels still require human oversight and downstream validation for regulated use.
Problem

Research questions and friction points this paper is trying to address.

Natural-Language-to-SQL
Local LLMs
Biopharmaceutical Manufacturing
Regulatory Compliance
Consumer-Grade Hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

local LLMs
natural-language-to-SQL
biopharmaceutical manufacturing
GxP compliance
consumer-grade hardware
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
S
Sagar Bhetwal
Department of Computer Science, University of the Cumberlands, Kentucky, United States
R
Rajan Bastakoti
Department of Computer Science, DePaul University, Chicago, IL, United States
Nirajan Acharya
Nirajan Acharya
University Of The Cumberlands
Software EngineeringNatural Language ProcessingDeep LearningMachine Learning
Gaurav Kumar Gupta
Gaurav Kumar Gupta
Youngstown State University
AILLMsComputer ScienceHealthCareComputer Vision