🤖 AI Summary
To address the challenges of visual-semantic gap and information redundancy in automated pathological report generation from whole-slide images (WSIs), this paper proposes a history-report-guided bimodal concurrent learning framework. The method introduces a knowledge retrieval mechanism that matches high-attention image regions with entries in a medical knowledge base; designs learnable visual and textual tokens for dynamic extraction of salient features; enforces cross-modal alignment via weight sharing; and integrates dual-stream representations through a multimodal decoder. Evaluated on the PathText (BRCA) dataset, the framework achieves a 7.4% relative improvement in natural language processing metrics and a 19.1% gain in HER2 classification accuracy. Ablation studies confirm the significant contribution of each component. Overall, the approach substantially enhances semantic expressiveness while effectively suppressing redundancy in generated reports.
📝 Abstract
Automated pathology report generation from Whole Slide Images (WSIs) faces two key challenges: (1) lack of semantic content in visual features and (2) inherent information redundancy in WSIs. To address these issues, we propose a novel Historical Report Guided extbf{Bi}-modal Concurrent Learning Framework for Pathology Report extbf{Gen}eration (BiGen) emulating pathologists' diagnostic reasoning, consisting of: (1) A knowledge retrieval mechanism to provide rich semantic content, which retrieves WSI-relevant knowledge from pre-built medical knowledge bank by matching high-attention patches and (2) A bi-modal concurrent learning strategy instantiated via a learnable visual token and a learnable textual token to dynamically extract key visual features and retrieved knowledge, where weight-shared layers enable cross-modal alignment between visual features and knowledge features. Our multi-modal decoder integrates both modals for comprehensive diagnostic reports generation. Experiments on the PathText (BRCA) dataset demonstrate our framework's superiority, achieving state-of-the-art performance with 7.4% relative improvement in NLP metrics and 19.1% enhancement in classification metrics for Her-2 prediction versus existing methods. Ablation studies validate the necessity of our proposed modules, highlighting our method's ability to provide WSI-relevant rich semantic content and suppress information redundancy in WSIs. Code is publicly available at https://github.com/DeepMed-Lab-ECNU/BiGen.