MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images

๐Ÿ“… 2025-11-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

160K/year
๐Ÿค– AI Summary
To address catastrophic forgetting, unknown task identities, and high resource overhead in lifelong learning for whole-slide image (WSI) cancer analysis, this paper reformulates lifelong learning as a model merging problemโ€”its first such formulation. We propose Orthogonal Continual Merging (OCM), a parameter-efficient model integration strategy, and Task-to-Class Prompt Alignment (TCP), a task-agnostic inference mechanism. Our approach leverages vision-language pathology foundation models, defines novel tasks via class-aware prompts, and avoids MLP backbone fine-tuning. OCM enables incremental model integration while preserving orthogonality among task-specific representations; TCP facilitates accurate, task-identifier-free inference, substantially mitigating forgetting. Experiments across six TCGA data streams demonstrate that our method outperforms replay-based continual learning and zero-shot baselines in class-incremental learning, achieving superior performance while maintaining low computational and memory overhead.

Technology Category

Application Category

๐Ÿ“ Abstract
Lifelong learning on Whole Slide Images (WSIs) aims to train or fine-tune a unified model sequentially on cancer-related tasks, reducing the resources and effort required for data transfer and processing, especially given the gigabyte-scale size of WSIs. In this paper, we introduce MergeSlide, a simple yet effective framework that treats lifelong learning as a model merging problem by leveraging a vision-language pathology foundation model. When a new task arrives, it is: 1) defined with class-aware prompts, 2) fine-tuned for a few epochs using an MLP-free backbone, and 3) merged into a unified model using an orthogonal continual merging strategy that preserves performance and mitigates catastrophic forgetting. For inference under the class-incremental learning (CLASS-IL) setting, where task identity is unknown, we introduce Task-to-Class Prompt-aligned (TCP) inference. Specifically, TCP first identifies the most relevant task using task-level prompts and then applies the corresponding class-aware prompts to generate predictions. To evaluate MergeSlide, we conduct experiments on a stream of six TCGA datasets. The results show that MergeSlide outperforms both rehearsal-based continual learning and vision-language zero-shot baselines. Code and data are available at https://github.com/caodoanh2001/MergeSlide.
Problem

Research questions and friction points this paper is trying to address.

Develops continual learning for gigabyte-scale whole slide medical images
Prevents catastrophic forgetting when merging new pathology tasks sequentially
Enables task-agnostic inference using vision-language prompt alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision-language pathology foundation model
Uses orthogonal continual merging strategy
Applies task-to-class prompt-aligned inference method