Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses the challenge of identifying disease subtypes in biomedical subgroup discovery that are driven solely by pathology, interpretable, and homogeneous. To this end, the authors propose Deep UCSL, a novel method that introduces contrastive learning into subgroup discovery for the first time. By explicitly modeling shared and differential patterns between patients and healthy controls, Deep UCSL constructs a discriminative representation space that captures only disease-specific variations. The method employs a new loss function based on conditional joint likelihood, optimized via an EM algorithm that alternately refines latent cluster assignments and deep feature encoders, while incorporating regularization terms to suppress variations shared with healthy individuals. Experiments on an MNIST-based illustrative example and four real-world medical imaging datasets demonstrate that Deep UCSL significantly outperforms existing approaches, yielding subgroups with markedly improved homogeneity and interpretability.
📝 Abstract
In biomedical Subgroup Discovery, practitioners are interested in discovering interpretable and homogeneous subgroups within a group of patients. In this paper, assuming that healthy subjects (i.e., controls) share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A regularization term further encourages representations to capture disease-specific variability while ignoring variability shared with controls. Compared to previous related works, our approach quantitatively improves the quality of the estimated subgroups, as demonstrated on a MNIST example and four distinct real medical imaging datasets. Code and datasets are available at: https://github.com/rlouiset/deep_ucsl.
Problem

Research questions and friction points this paper is trying to address.

Subgroup Discovery
Disease Subgroups
Contrastive Learning
Pathological Factors
Healthy Controls
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Subgroup Discovery
Deep UCSL
Disease Subtyping
Conditional Joint Likelihood
Expectation-Maximization
🔎 Similar Papers
No similar papers found.
R
Robin Louiset
NeuroSpin, Université Paris-Saclay, CEA, Gif-sur-Yvette, 91191, France; LTCI, Institut Polytechnique de Paris, Télécom Paris, Palaiseau, 91120, France
E
Edouard Duchesnay
NeuroSpin, Université Paris-Saclay, CEA, Gif-sur-Yvette, 91191, France
B
Benoit Dufumier
NeuroSpin, Université Paris-Saclay, CEA, Gif-sur-Yvette, 91191, France
A
Antoine Grigis
NeuroSpin, Université Paris-Saclay, CEA, Gif-sur-Yvette, 91191, France
Pietro Gori
Pietro Gori
Télécom Paris (IPParis)
Representation learningmachine learningmedical imagingcomputational anatomy