DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the clinical risks posed by hallucinations in large language models during drug-related question answering by proposing DrugClaw, a multi-agent retrieval-augmented system that leverages a reflection-driven state-machine workflow to retrieve and generate traceable, high-fidelity answers from authoritative pharmacological and pharmacovigilance knowledge bases. The approach innovatively integrates a multi-agent architecture with a state-machine pipeline and introduces DrugAudit—the first authority-aware evaluation benchmark—featuring fine-grained metrics for source alignment, semantic overlap, and citation faithfulness. Experimental results demonstrate that DrugClaw achieves state-of-the-art performance, with a primary-source citation rate of 0.918 (+10.1 percentage points), faithfulness of 0.887 (+5.9 pp), MedQA accuracy of 0.920, and PubMedQA score of 0.693, consistently outperforming existing methods.
📝 Abstract
Drug-information question answering is a high-stakes setting where hallucinated facts can mislead clinical decision-making and the provenance of each cited fact matters as much as the fact itself. We present DrugClaw, a multi-agent retrieval-augmented system that queries a registry of drug and pharmacovigilance skills via a reflection-driven state-machine workflow and returns answers grounded in primary regulatory or peer-reviewed records. We also contribute DrugAudit, a 3,772-item authority-aware benchmark with an evaluation panel that scores upstream-of-gold source match, token-level semantic snippet overlap, and citation faithfulness under a dual-judge LLM-as-judge protocol with inter-judge kappa = 0.88 (almost-perfect). Across DrugAudit plus drug-related subsets of MedQA (751) and PubMedQA (512), DrugClaw is top-1 on every column of the headline table: composite Evidence Index under both judges, judge-mediated answer correctness, primary-source rate (0.918, +10.1 pp over next-best), faithfulness (0.887, +5.9 pp), MedQA (0.920), and PubMedQA (0.693).
Problem

Research questions and friction points this paper is trying to address.

drug-information question answering
hallucination
provenance
primary-source grounding
authority-aware evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation
multi-agent system
primary-source grounding
authority-aware benchmark
citation faithfulness
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
Qing Wang
Qing Wang
University of Florida
AIBlockchainAI4SciBioinformatics
Bo Li
Bo Li
University of Macau
AI4BiologyAI Virtual CellPhenotypic Drug DiscoveryCell Painting
J
Jialu Liang
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Florida, USA
D
Daling Shi
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Florida, USA
Bob Zhang
Bob Zhang
University of Macau
Biometricspattern recognitionimage processing
Qianqian Song
Qianqian Song
Assistant Professor, University of Florida
Translational BioinformaticsBiomedical InformaticsArtificial Intelligence