MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address catastrophic forgetting, unknown task identities, and high resource overhead in lifelong learning for whole-slide image (WSI) cancer analysis, this paper reformulates lifelong learning as a model merging problem—its first such formulation. We propose Orthogonal Continual Merging (OCM), a parameter-efficient model integration strategy, and Task-to-Class Prompt Alignment (TCP), a task-agnostic inference mechanism. Our approach leverages vision-language pathology foundation models, defines novel tasks via class-aware prompts, and avoids MLP backbone fine-tuning. OCM enables incremental model integration while preserving orthogonality among task-specific representations; TCP facilitates accurate, task-identifier-free inference, substantially mitigating forgetting. Experiments across six TCGA data streams demonstrate that our method outperforms replay-based continual learning and zero-shot baselines in class-incremental learning, achieving superior performance while maintaining low computational and memory overhead.

Technology Category

Application Category

📝 Abstract

Lifelong learning on Whole Slide Images (WSIs) aims to train or fine-tune a unified model sequentially on cancer-related tasks, reducing the resources and effort required for data transfer and processing, especially given the gigabyte-scale size of WSIs. In this paper, we introduce MergeSlide, a simple yet effective framework that treats lifelong learning as a model merging problem by leveraging a vision-language pathology foundation model. When a new task arrives, it is: 1) defined with class-aware prompts, 2) fine-tuned for a few epochs using an MLP-free backbone, and 3) merged into a unified model using an orthogonal continual merging strategy that preserves performance and mitigates catastrophic forgetting. For inference under the class-incremental learning (CLASS-IL) setting, where task identity is unknown, we introduce Task-to-Class Prompt-aligned (TCP) inference. Specifically, TCP first identifies the most relevant task using task-level prompts and then applies the corresponding class-aware prompts to generate predictions. To evaluate MergeSlide, we conduct experiments on a stream of six TCGA datasets. The results show that MergeSlide outperforms both rehearsal-based continual learning and vision-language zero-shot baselines. Code and data are available at https://github.com/caodoanh2001/MergeSlide.

Problem

Research questions and friction points this paper is trying to address.

Develops continual learning for gigabyte-scale whole slide medical images

Prevents catastrophic forgetting when merging new pathology tasks sequentially

Enables task-agnostic inference using vision-language prompt alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision-language pathology foundation model

Uses orthogonal continual merging strategy

Applies task-to-class prompt-aligned inference method

🔎 Similar Papers

No similar papers found.

Authors to Follow