🤖 AI Summary
Existing clinical agents for medical order generation are largely confined to coarse-grained decision-making, struggling to produce fine-grained, executable orders. This work proposes CAREAgent, the first structured reasoning framework specifically designed for generating executable clinical orders. CAREAgent constructs verifiable reasoning trajectories aligned with real-world clinical workflow logic through a two-stage process and is trained via supervised fine-tuning combined with multidimensional reward-based reinforcement learning. The approach substantially enhances fine-grained clinical decision capabilities, achieving F1 score improvements of 5.05%, 2.09%, and 0.86% over current state-of-the-art single-agent, multi-agent, and reasoning-based methods, respectively, on the ClinicalBench benchmark.
📝 Abstract
Clinical order generation serves as a critical bridge between clinical decision-making and real-world practice, translating medical decisions into concrete and executable orders. Existing agents mainly focus on coarse-grained decisions and overlook the fine-grained, executable information required for clinical orders. To address this gap, we propose CAREAgent, an agent for clinical order generation. To support its training, we introduce a two-stage agentic reasoning data construction method. First, we design an agent framework that constructs verifiable reasoning trajectories aligned with realistic clinical tool usage. Second, we filter reasoning trajectories by format compliance, order validity, and clinical plausibility. Building on the constructed data, the model is first trained via supervised fine-tuning to acquire fundamental reasoning formats and medical knowledge, and is subsequently optimized through reinforcement learning with multi-dimensional reward functions to enhance complex clinical reasoning capabilities. Experiments on multiple benchmarks demonstrate the effectiveness of CAREAgent. On ClinicalBench (unseen during training), CAREAgent improves the F1 score by 5.05%, 2.09%, and 0.86% over the single-agent, multi-agent, and agentic reasoning methods, respectively.