Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of visual-semantic gap and information redundancy in automated pathological report generation from whole-slide images (WSIs), this paper proposes a history-report-guided bimodal concurrent learning framework. The method introduces a knowledge retrieval mechanism that matches high-attention image regions with entries in a medical knowledge base; designs learnable visual and textual tokens for dynamic extraction of salient features; enforces cross-modal alignment via weight sharing; and integrates dual-stream representations through a multimodal decoder. Evaluated on the PathText (BRCA) dataset, the framework achieves a 7.4% relative improvement in natural language processing metrics and a 19.1% gain in HER2 classification accuracy. Ablation studies confirm the significant contribution of each component. Overall, the approach substantially enhances semantic expressiveness while effectively suppressing redundancy in generated reports.

Technology Category

Application Category

📝 Abstract
Automated pathology report generation from Whole Slide Images (WSIs) faces two key challenges: (1) lack of semantic content in visual features and (2) inherent information redundancy in WSIs. To address these issues, we propose a novel Historical Report Guided extbf{Bi}-modal Concurrent Learning Framework for Pathology Report extbf{Gen}eration (BiGen) emulating pathologists' diagnostic reasoning, consisting of: (1) A knowledge retrieval mechanism to provide rich semantic content, which retrieves WSI-relevant knowledge from pre-built medical knowledge bank by matching high-attention patches and (2) A bi-modal concurrent learning strategy instantiated via a learnable visual token and a learnable textual token to dynamically extract key visual features and retrieved knowledge, where weight-shared layers enable cross-modal alignment between visual features and knowledge features. Our multi-modal decoder integrates both modals for comprehensive diagnostic reports generation. Experiments on the PathText (BRCA) dataset demonstrate our framework's superiority, achieving state-of-the-art performance with 7.4% relative improvement in NLP metrics and 19.1% enhancement in classification metrics for Her-2 prediction versus existing methods. Ablation studies validate the necessity of our proposed modules, highlighting our method's ability to provide WSI-relevant rich semantic content and suppress information redundancy in WSIs. Code is publicly available at https://github.com/DeepMed-Lab-ECNU/BiGen.
Problem

Research questions and friction points this paper is trying to address.

Lack of semantic content in WSI visual features
Inherent information redundancy in Whole Slide Images
Need for comprehensive pathology report generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Historical report guided bi-modal learning
Knowledge retrieval from medical bank
Cross-modal alignment with shared layers
🔎 Similar Papers
No similar papers found.
Ling Zhang
Ling Zhang
Alibaba DAMO Academy USA
Medical Image AnalysisMedical Image ComputingMachine LearningImage Processing
Boxiang Yun
Boxiang Yun
East China Normal University
Medical Image Processing
Q
Qingli Li
Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China
Y
Yan Wang
Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, Shanghai, China