🤖 AI Summary
This study addresses the challenge of simultaneously ensuring regulatory compliance and factual accuracy in generating informed consent forms (ICFs) for high-risk clinical trials. Methodologically, it introduces the first AI collaborator framework for ICF generation by: (1) explicitly encoding 18 core FDA compliance rules; (2) constructing the first benchmark dataset of 900 protocol–ICF paired instances; and (3) integrating protocol parsing, rule-guided generation, inline source citation, and human-in-the-loop feedback. Its key contribution is a traceable, verifiable compliance-aware generation paradigm. Experiments demonstrate that the framework achieves 99.7% core compliance—30 percentage points higher than GPT-4o—and attains >90% factual accuracy in expert human evaluation, significantly outperforming mainstream baselines (57%–82%). This work establishes a novel, trustworthy AI generation paradigm for high-stakes medical documentation.
📝 Abstract
Leveraging large language models (LLMs) to generate high-stakes documents, such as informed consent forms (ICFs), remains a significant challenge due to the extreme need for regulatory compliance and factual accuracy. Here, we present InformGen, an LLM-driven copilot for accurate and compliant ICF drafting by optimized knowledge document parsing and content generation, with humans in the loop. We further construct a benchmark dataset comprising protocols and ICFs from 900 clinical trials. Experimental results demonstrate that InformGen achieves near 100% compliance with 18 core regulatory rules derived from FDA guidelines, outperforming a vanilla GPT-4o model by up to 30%. Additionally, a user study with five annotators shows that InformGen, when integrated with manual intervention, attains over 90% factual accuracy, significantly surpassing the vanilla GPT-4o model's 57%-82%. Crucially, InformGen ensures traceability by providing inline citations to source protocols, enabling easy verification and maintaining the highest standards of factual integrity.