HyperWalker: Dynamic Hypergraph-Based Deep Diagnosis for Multi-Hop Clinical Modeling across EHR and X-Ray in Medical VLMs

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing medical vision-language models, which typically process samples in isolation and neglect the high-order associations among longitudinal electronic health records (EHRs) and related cases, thereby relying solely on image information for diagnosis. To overcome this, the authors propose HyperWalker, a novel framework that integrates dynamic hypergraphs with test-time training. It constructs an iBrochure hypergraph to model high-order relationships in multimodal clinical data and employs a reinforcement learning agent, Walker, to navigate this hypergraph via a multi-hop orthogonal retrieval strategy to discover optimal diagnostic pathways. Evaluated on the MIMIC medical report generation and EHRXQA medical visual question answering tasks, HyperWalker achieves state-of-the-art performance, significantly enhancing multi-hop clinical reasoning across EHRs and medical imaging.

Technology Category

Application Category

📝 Abstract
Automated clinical diagnosis remains a core challenge in medical AI, which usually requires models to integrate multi-modal data and reason across complex, case-specific contexts. Although recent methods have advanced medical report generation (MRG) and visual question answering (VQA) with medical vision-language models (VLMs), these methods, however, predominantly operate under a sample-isolated inference paradigm, as such processing cases independently without access to longitudinal electronic health records (EHRs) or structurally related patient examples. This paradigm limits reasoning to image-derived information alone, which ignores external complementary medical evidence for potentially more accurate diagnosis. To overcome this limitation, we propose \textbf{HyperWalker}, a \textit{Deep Diagnosis} framework that reformulates clinical reasoning via dynamic hypergraphs and test-time training. First, we construct a dynamic hypergraph, termed \textbf{iBrochure}, to model the structural heterogeneity of EHR data and implicit high-order associations among multimodal clinical information. Within this hypergraph, a reinforcement learning agent, \textbf{Walker}, navigates to and identifies optimal diagnostic paths. To ensure comprehensive coverage of diverse clinical characteristics in test samples, we incorporate a \textit{linger mechanism}, a multi-hop orthogonal retrieval strategy that iteratively selects clinically complementary neighborhood cases reflecting distinct clinical attributes. Experiments on MRG with MIMIC and medical VQA on EHRXQA demonstrate that HyperWalker achieves state-of-the-art performance. Code is available at: https://github.com/Bean-Young/HyperWalker
Problem

Research questions and friction points this paper is trying to address.

medical VLMs
multi-hop clinical reasoning
electronic health records (EHR)
sample-isolated inference
multimodal clinical diagnosis
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic hypergraph
deep diagnosis
medical vision-language models
reinforcement learning agent
multi-hop retrieval
🔎 Similar Papers
No similar papers found.