🤖 AI Summary
This work addresses the limitation of conventional large medical language models, which typically support only single-turn question answering and lack capabilities for continuous patient care. To overcome this, the authors propose a clinical-grade agent system designed specifically for longitudinal medical care. The system integrates a unified runtime, a core reasoning model, and a clinical tool layer, and introduces a novel reinforcement learning framework for continuous care that combines SPAR++ reward modeling, reasoning path compression, and curriculum learning, while supporting multi-agent collaboration with action constraints. Leveraging long-context patient memory management, evidence-based retrieval, multimodal medical perception (including X-rays, dermatological images, and document OCR), and the Baichuan-Harness unified runtime, the system achieves state-of-the-art performance across static knowledge, dynamic interviewing, clinical memory, evidence retrieval, and multimodal understanding, reducing hallucination rates to 3.3%.
📝 Abstract
Baichuan-M4 is Baichuan Intelligence's clinical-grade medical large model, designed for \emph{continuous care} rather than single-turn medical question answering. It is built as a coordinated medical agent system around three pillars: \textbf{Baichuan-Harness}, a unified runtime that keeps reinforcement-learning training and real-world deployment consistent while enforcing action constraints, tool use, long-term patient memory, and multi-agent coordination; a \textbf{core reasoning model} trained with a continuous-care reinforcement-learning framework that integrates span-level reward modeling (SPAR++), reasoning-path compression, curriculum learning, and stabilized policy optimization; and a \textbf{clinical tool layer} for patient-memory management, authoritative evidence-based retrieval, and multimodal medical perception across documents, X-rays, and dermatology. On a cross-dimensional medical evaluation suite, Baichuan-M4 attains leading results in static medical knowledge and safety, dynamic OSCE-style consultation, long-context clinical memory, evidence-based retrieval, medical document OCR, and multimodal image understanding, while lowering the hallucination rate to 3.3\%.