CellCLIP -- Learning Perturbation Effects in Cell Painting via Text-Guided Contrastive Learning

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the substantial semantic gap between Cell Painting images and heterogeneous perturbations (e.g., small molecules, CRISPR), as well as the challenge of unifying their representations, this paper proposes the first cross-modal contrastive learning framework tailored for cellular phenotypic analysis. It innovatively introduces a microscopy-channel encoding mechanism to explicitly model fluorescence-channel-specific information, and jointly fine-tunes frozen ViT/ResNet image encoders with BERT-style text encoders—within a CLIP-inspired paradigm—to align perturbation descriptions with morphological phenotypes in a shared embedding space. The method significantly improves cross-modal retrieval accuracy, outperforms existing open-source models on downstream tasks including perturbation clustering and mechanism inference, and achieves a 3.2× speedup in inference—thereby overcoming the transfer bottleneck of natural-image pre-trained models in cellular imaging.

Technology Category

Application Category

📝 Abstract
High-content screening (HCS) assays based on high-throughput microscopy techniques such as Cell Painting have enabled the interrogation of cells' morphological responses to perturbations at an unprecedented scale. The collection of such data promises to facilitate a better understanding of the relationships between different perturbations and their effects on cellular state. Towards achieving this goal, recent advances in cross-modal contrastive learning could, in theory, be leveraged to learn a unified latent space that aligns perturbations with their corresponding morphological effects. However, the application of such methods to HCS data is not straightforward due to substantial differences in the semantics of Cell Painting images compared to natural images, and the difficulty of representing different classes of perturbations (e.g., small molecule vs CRISPR gene knockout) in a single latent space. In response to these challenges, here we introduce CellCLIP, a cross-modal contrastive learning framework for HCS data. CellCLIP leverages pre-trained image encoders coupled with a novel channel encoding scheme to better capture relationships between different microscopy channels in image embeddings, along with natural language encoders for representing perturbations. Our framework outperforms current open-source models, demonstrating the best performance in both cross-modal retrieval and biologically meaningful downstream tasks while also achieving significant reductions in computation time.
Problem

Research questions and friction points this paper is trying to address.

Align perturbations with cellular effects in microscopy data
Overcome semantic differences in Cell Painting images
Represent diverse perturbation classes in unified space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-guided contrastive learning for cell perturbations
Channel encoding for microscopy image embeddings
Natural language encoders for perturbation representation
🔎 Similar Papers
No similar papers found.
M
Mingyu Lu
Paul G. Allen School of Computer Science & Engineering, University of Washington
Ethan Weinberger
Ethan Weinberger
Paul G. Allen School of Computer Science & Engineering, University of Washington
C
Chanwoo Kim
Paul G. Allen School of Computer Science & Engineering, University of Washington
Su-In Lee
Su-In Lee
Computer Science & Engineering, University of Washington
AIMLComputational biology & medicine