🤖 AI Summary
Current continual learning (CL) methods lack support for multi-instance learning (MIL), leading to catastrophic forgetting in single-cell blood disease diagnosis under streaming clinical data. Method: We propose the first MIL-specific CL framework, which jointly leverages instance-level attention and class-center distance metrics to dynamically select highly discriminative representative instances for exemplar sets. It further integrates bag-level mean representations with a replay mechanism to preserve both historical task diversity and knowledge stability. Results: Evaluated on real-world monthly single-cell leukemia data, our framework significantly outperforms existing CL baselines, effectively mitigating performance degradation and enabling robust adaptive updates to evolving clinical data distributions. Contribution: This work pioneers the integration of MIL with continual learning, establishing a scalable, low-forgetting incremental modeling paradigm tailored to streaming medical diagnostics.
📝 Abstract
The dynamic environment of laboratories and clinics, with streams of data arriving on a daily basis, requires regular updates of trained machine learning models for consistent performance. Continual learning is supposed to help train models without catastrophic forgetting. However, state-of-the-art methods are ineffective for multiple instance learning (MIL), which is often used in single-cell-based hematologic disease diagnosis (e.g., leukemia detection). Here, we propose the first continual learning method tailored specifically to MIL. Our method is rehearsal-based over a selection of single instances from various bags. We use a combination of the instance attention score and distance from the bag mean and class mean vectors to carefully select which samples and instances to store in exemplary sets from previous tasks, preserving the diversity of the data. Using the real-world input of one month of data from a leukemia laboratory, we study the effectiveness of our approach in a class incremental scenario, comparing it to well-known continual learning methods. We show that our method considerably outperforms state-of-the-art methods, providing the first continual learning approach for MIL. This enables the adaptation of models to shifting data distributions over time, such as those caused by changes in disease occurrence or underlying genetic alterations.