๐ค AI Summary
This work addresses the challenge that existing large language modelโbased medical agents struggle to jointly learn multiple non-interfering diagnostic reasoning paradigms when patient information is incomplete. To overcome this, the authors propose the PACT framework, which employs a Doctor-Patient-Supervisor triadic dialogue synthesis mechanism to prevent answer leakage. PACT integrates a DPS data generation strategy with periodic anchor aggregation based on symbolic consistency, enabling decoupled learning and co-evolution of diverse diagnostic strategies. The framework is trained using LoRA-based branched fine-tuning and evaluated on a dynamic Chinese multi-turn clinical consultation benchmark. Experimental results demonstrate that PACT significantly outperforms general-purpose, medical-specific, and task-adapted baseline models in both diagnostic accuracy and consultation process metrics.
๐ Abstract
Clinical diagnosis requires flexible use of multiple reasoning paradigms under incomplete patient information. Existing LLM-based medical agents show strong medical reasoning ability, but single-paradigm or naively mixed dialogue supervision makes these paradigms difficult to learn without interference. We propose \textbf{PACT} (Periodic Anchor Consensus Training), a framework that couples supervised multi-paradigm dialogue synthesis with consensus-based Branch training. At the data level, \textbf{DPS} (Doctor-Patient-Supervisor) uses complete electronic medical records (EMRs) for quality control while keeping the doctor agent restricted to patient-visible information. This produces validated dialogues under four diagnostic reasoning paradigms without leaking hidden clinical answers. At the training level, PACT trains one paradigm-specific LoRA Branch per paradigm and periodically aggregates Branches into a shared Anchor through sign consensus. We further construct a dynamic multi-turn Chinese medical diagnosis benchmark for interactive consultation. Experiments show that PACT achieves state-of-the-art performance among compared proprietary, medical-specialized, and task-adapted baselines on diagnostic outcome and consultation-process metrics.