A Cross Modal Knowledge Distillation&Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of limited morphological interpretability and poor generalizability in transcriptomic representations due to the scarcity of weakly paired multimodal data (transcriptomics + microscopy images). To this end, we propose Semi-Clipped, a cross-modal knowledge distillation framework integrated with Perturbation Embedding Augmentation (PEA). Without requiring strongly aligned labels, Semi-Clipped leverages a pretrained Vision Transformer (ViT), contrastive learning, and multimodal alignment losses to distill morphological knowledge from histopathological images into gene expression embeddings. To our knowledge, this is the first work enabling morphology–transcriptome joint representation learning under weak supervision. Evaluated on multiple cell response prediction tasks—including drug response and perturbation effect estimation—Semi-Clipped achieves state-of-the-art performance, demonstrating superior generalization, robustness to input perturbations, and enhanced gene-level interpretability.

Technology Category

Application Category

📝 Abstract
Understanding cellular responses to stimuli is crucial for biological discovery and drug development. Transcriptomics provides interpretable, gene-level insights, while microscopy imaging offers rich predictive features but is harder to interpret. Weakly paired datasets, where samples share biological states, enable multimodal learning but are scarce, limiting their utility for training and multimodal inference. We propose a framework to enhance transcriptomics by distilling knowledge from microscopy images. Using weakly paired data, our method aligns and binds modalities, enriching gene expression representations with morphological information. To address data scarcity, we introduce (1) Semi-Clipped, an adaptation of CLIP for cross-modal distillation using pretrained foundation models, achieving state-of-the-art results, and (2) PEA (Perturbation Embedding Augmentation), a novel augmentation technique that enhances transcriptomics data while preserving inherent biological information. These strategies improve the predictive power and retain the interpretability of transcriptomics, enabling rich unimodal representations for complex biological tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing transcriptomics using microscopy image knowledge
Addressing data scarcity in weakly paired multimodal datasets
Improving predictive power while retaining interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal knowledge distillation from microscopy
Semi-Clipped adaptation for distillation
Perturbation Embedding Augmentation technique
🔎 Similar Papers
No similar papers found.
I
Ihab Bendidi
Recursion, Salt Lake City, USA; Valence Labs, Montr ´eal, Canada; Ecole Normale Sup ´erieure PSL, Paris, France
Y
Yassir El Mesbahi
Recursion, Salt Lake City, USA; Valence Labs, Montr ´eal, Canada
Alisandra K. Denton
Alisandra K. Denton
Valence Labs, QC, Canada
bioinformaticsdeep learningbiologymultimodalityperturbation analysis
K
Karush Suri
Recursion, Salt Lake City, USA; Valence Labs, Montr ´eal, Canada
Kian Kenyon-Dean
Kian Kenyon-Dean
McGill University
natural language processingmachine learningdeep learningclustering
Auguste Genovesio
Auguste Genovesio
Ecole Normale Supérieure
deep learningcomputational biologyimaging
Emmanuel Noutahi
Emmanuel Noutahi
Valence Labs
representation learninggenerative modelsdrug designgenome evolutionmolecular optimization