Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenges of visual-semantic gap and information redundancy in automated pathological report generation from whole-slide images (WSIs), this paper proposes a history-report-guided bimodal concurrent learning framework. The method introduces a knowledge retrieval mechanism that matches high-attention image regions with entries in a medical knowledge base; designs learnable visual and textual tokens for dynamic extraction of salient features; enforces cross-modal alignment via weight sharing; and integrates dual-stream representations through a multimodal decoder. Evaluated on the PathText (BRCA) dataset, the framework achieves a 7.4% relative improvement in natural language processing metrics and a 19.1% gain in HER2 classification accuracy. Ablation studies confirm the significant contribution of each component. Overall, the approach substantially enhances semantic expressiveness while effectively suppressing redundancy in generated reports.

Technology Category

Application Category

📝 Abstract

Automated pathology report generation from Whole Slide Images (WSIs) faces two key challenges: (1) lack of semantic content in visual features and (2) inherent information redundancy in WSIs. To address these issues, we propose a novel Historical Report Guided extbf{Bi}-modal Concurrent Learning Framework for Pathology Report extbf{Gen}eration (BiGen) emulating pathologists' diagnostic reasoning, consisting of: (1) A knowledge retrieval mechanism to provide rich semantic content, which retrieves WSI-relevant knowledge from pre-built medical knowledge bank by matching high-attention patches and (2) A bi-modal concurrent learning strategy instantiated via a learnable visual token and a learnable textual token to dynamically extract key visual features and retrieved knowledge, where weight-shared layers enable cross-modal alignment between visual features and knowledge features. Our multi-modal decoder integrates both modals for comprehensive diagnostic reports generation. Experiments on the PathText (BRCA) dataset demonstrate our framework's superiority, achieving state-of-the-art performance with 7.4% relative improvement in NLP metrics and 19.1% enhancement in classification metrics for Her-2 prediction versus existing methods. Ablation studies validate the necessity of our proposed modules, highlighting our method's ability to provide WSI-relevant rich semantic content and suppress information redundancy in WSIs. Code is publicly available at https://github.com/DeepMed-Lab-ECNU/BiGen.

Problem

Research questions and friction points this paper is trying to address.

Lack of semantic content in WSI visual features

Inherent information redundancy in Whole Slide Images

Need for comprehensive pathology report generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Historical report guided bi-modal learning

Knowledge retrieval from medical bank

Cross-modal alignment with shared layers

🔎 Similar Papers

No similar papers found.

Authors to Follow