Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current large language model (LLM)-generated clinical research manuscripts commonly suffer from fabricated citations, data drift, and omissions of reporting guidelines, yet existing tools lack effective validation mechanisms. This work proposes an integrated generation-and-verification architecture that decomposes the writing process into 43 skill modules—including 21 deterministic detectors—orchestrated by a unified coordinator. It introduces a “maximally deterministic” completeness gating mechanism to enforce structured, traceable audits and re-execution checks at each stage. Evaluated on the STARD, PRISMA, and STROBE benchmark datasets, the approach successfully identified all 27 injected defects with zero false positives, substantially outperforming general-purpose LLM-based review methods.

📝 Abstract

Objective. Large language models (LLMs) increasingly draft clinical research manuscripts, but their fluency can hide fabricated citations, numbers that drift from source tables, and unmet reporting-guideline items. Existing tools generate text without verifying it, and self-critique inherits the blind spots that produce confident fabrication. We describe an architecture that pairs generation with verification. Methods. The design rests on three principles: decompose the workflow into self-contained skills, gate every stage transition with halt-on-failure, and resolve each integrity question with the cheapest sufficient mechanism -- a deterministic, re-executable check where one suffices, and a prose-level probe only where interpretation is unavoidable. This determinism-where-possible split, organized as an integrity-gate taxonomy, is the core contribution. It is realized as MedSci Skills, an open-source toolkit of 43 skills coordinated by one orchestrator, whose deterministic tier comprises 21 standard-library detectors. We evaluate it on three reproducible public-dataset pipelines (STARD, PRISMA, STROBE) and a seeded-defect ablation. Results. Across the three pipelines every content-hash manifest verified clean and the gates surfaced real defects. On 27 identical injected defects the deterministic gates detected all 27 with no false positives on the matched clean fixtures, whereas a generic single-prompt LLM reviewer detected 11, its misses concentrated in generated-code, bibliography-internal, and style defects the prose does not expose. Conclusion. Determinism-where-possible verification yields an auditable, re-executable trail that exposes the evidence a human needs to check an LLM-assisted manuscript -- feasibility and reproducibility evidence, not a claim of human-competitive quality, which a separate blinded study addresses. MedSci Skills is MIT-licensed and archived (v3.8.0).

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Clinical Manuscript Preparation

Integrity Verification

Auditable Architecture

Biomedical Informatics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic Integrity Gates

LLM-assisted manuscript verification

auditable biomedical informatics