Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the challenge of identifying disease subtypes in biomedical subgroup discovery that are driven solely by pathology, interpretable, and homogeneous. To this end, the authors propose Deep UCSL, a novel method that introduces contrastive learning into subgroup discovery for the first time. By explicitly modeling shared and differential patterns between patients and healthy controls, Deep UCSL constructs a discriminative representation space that captures only disease-specific variations. The method employs a new loss function based on conditional joint likelihood, optimized via an EM algorithm that alternately refines latent cluster assignments and deep feature encoders, while incorporating regularization terms to suppress variations shared with healthy individuals. Experiments on an MNIST-based illustrative example and four real-world medical imaging datasets demonstrate that Deep UCSL significantly outperforms existing approaches, yielding subgroups with markedly improved homogeneity and interpretability.

📝 Abstract

In biomedical Subgroup Discovery, practitioners are interested in discovering interpretable and homogeneous subgroups within a group of patients. In this paper, assuming that healthy subjects (i.e., controls) share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A regularization term further encourages representations to capture disease-specific variability while ignoring variability shared with controls. Compared to previous related works, our approach quantitatively improves the quality of the estimated subgroups, as demonstrated on a MNIST example and four distinct real medical imaging datasets. Code and datasets are available at: https://github.com/rlouiset/deep_ucsl.

Problem

Research questions and friction points this paper is trying to address.

Subgroup Discovery

Disease Subgroups

Contrastive Learning

Pathological Factors

Healthy Controls

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Subgroup Discovery

Deep UCSL

Disease Subtyping