🤖 AI Summary
Existing large language models (LLMs) perform well on static, single-turn medical question answering but struggle with the multi-turn iterative reasoning required in clinical consultations, primarily due to unstructured and loosely connected clinical facts in dialogue histories. To address this, we propose TriMedQA—a novel framework that (1) automatically converts patient dialogues into structured clinical triplets grounded in a knowledge graph to enable multi-hop reasoning; (2) adopts a two-stage training strategy—freezing the LLM backbone while fine-tuning only a lightweight graph projection module—to ensure both reasoning consistency and deployment efficiency; and (3) synergistically integrates structured knowledge via a prompt-driven triplet generator and a graph encoder–projection module. Evaluated on the iMedQA benchmark, TriMedQA achieves up to a 10.4% absolute accuracy improvement over five strong baselines, significantly enhancing the reliability and interpretability of LLMs in interactive medical diagnosis.
📝 Abstract
Large Language Models (LLMs) perform strongly in static and single-turn medical Question Answer (QA) benchmarks, yet such settings diverge from the iterative information gathering process required in practical clinical consultations. The MEDIQ framework addresses this mismatch by recasting the diagnosis as an interactive dialogue between a patient and an expert system, but the reliability of LLMs drops dramatically when forced to reason with dialogue logs, where clinical facts appear in sentences without clear links. To bridge this gap, we introduce TriMediQ, a triplet-structured approach that summarises patient responses into triplets and integrates them into a Knowledge Graph (KG), enabling multi-hop reasoning. We introduce a frozen triplet generator that extracts clinically relevant triplets, using prompts designed to ensure factual consistency. In parallel, a trainable projection module, comprising a graph encoder and a projector, captures relational information from the KG to enhance expert reasoning. TriMediQ operates in two steps: (i) the projection module fine-tuning with all LLM weights frozen; and (ii) using the fine-tuned module to guide multi-hop reasoning during inference. We evaluate TriMediQ on two interactive QA benchmarks, showing that it achieves up to 10.4% improvement in accuracy over five baselines on the iMedQA dataset. These results demonstrate that converting patient responses into structured triplet-based graphs enables more accurate clinical reasoning in multi-turn settings, providing a solution for the deployment of LLM-based medical assistants.