MergeSlide: Continual Model Merging and Task-to-Class Prompt-Aligned Inference for Lifelong Learning on Whole Slide Images

๐Ÿ“… 2025-11-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address catastrophic forgetting, unknown task identities, and high resource overhead in lifelong learning for whole-slide image (WSI) cancer analysis, this paper reformulates lifelong learning as a model merging problemโ€”its first such formulation. We propose Orthogonal Continual Merging (OCM), a parameter-efficient model integration strategy, and Task-to-Class Prompt Alignment (TCP), a task-agnostic inference mechanism. Our approach leverages vision-language pathology foundation models, defines novel tasks via class-aware prompts, and avoids MLP backbone fine-tuning. OCM enables incremental model integration while preserving orthogonality among task-specific representations; TCP facilitates accurate, task-identifier-free inference, substantially mitigating forgetting. Experiments across six TCGA data streams demonstrate that our method outperforms replay-based continual learning and zero-shot baselines in class-incremental learning, achieving superior performance while maintaining low computational and memory overhead.

Technology Category

Application Category

๐Ÿ“ Abstract
Lifelong learning on Whole Slide Images (WSIs) aims to train or fine-tune a unified model sequentially on cancer-related tasks, reducing the resources and effort required for data transfer and processing, especially given the gigabyte-scale size of WSIs. In this paper, we introduce MergeSlide, a simple yet effective framework that treats lifelong learning as a model merging problem by leveraging a vision-language pathology foundation model. When a new task arrives, it is: 1) defined with class-aware prompts, 2) fine-tuned for a few epochs using an MLP-free backbone, and 3) merged into a unified model using an orthogonal continual merging strategy that preserves performance and mitigates catastrophic forgetting. For inference under the class-incremental learning (CLASS-IL) setting, where task identity is unknown, we introduce Task-to-Class Prompt-aligned (TCP) inference. Specifically, TCP first identifies the most relevant task using task-level prompts and then applies the corresponding class-aware prompts to generate predictions. To evaluate MergeSlide, we conduct experiments on a stream of six TCGA datasets. The results show that MergeSlide outperforms both rehearsal-based continual learning and vision-language zero-shot baselines. Code and data are available at https://github.com/caodoanh2001/MergeSlide.
Problem

Research questions and friction points this paper is trying to address.

Develops continual learning for gigabyte-scale whole slide medical images
Prevents catastrophic forgetting when merging new pathology tasks sequentially
Enables task-agnostic inference using vision-language prompt alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision-language pathology foundation model
Uses orthogonal continual merging strategy
Applies task-to-class prompt-aligned inference method
๐Ÿ”Ž Similar Papers
No similar papers found.
Doanh C. Bui
Doanh C. Bui
NARA Institute of Science and Technology
Deep LearningComputer Vision
Ba Hung Ngo
Ba Hung Ngo
Graduate School of Data Science, Chonnam National University, South Korea
H
Hoai Luan Pham
Nara Institute of Science and Technology, Japan
Khang Nguyen
Khang Nguyen
UIT
Artificial IntelligenceComputer VisionTimetabling
M
Maรฏ K. Nguyen
ETIS (UMR 8051), CY Cergy Paris University, ENSEA, CNRS, France
Y
Yasuhiko Nakashima
Nara Institute of Science and Technology, Japan