A Cross Modal Knowledge Distillation&Data Augmentation Recipe for Improving Transcriptomics Representations through Morphological Features

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses the challenge of limited morphological interpretability and poor generalizability in transcriptomic representations due to the scarcity of weakly paired multimodal data (transcriptomics + microscopy images). To this end, we propose Semi-Clipped, a cross-modal knowledge distillation framework integrated with Perturbation Embedding Augmentation (PEA). Without requiring strongly aligned labels, Semi-Clipped leverages a pretrained Vision Transformer (ViT), contrastive learning, and multimodal alignment losses to distill morphological knowledge from histopathological images into gene expression embeddings. To our knowledge, this is the first work enabling morphology–transcriptome joint representation learning under weak supervision. Evaluated on multiple cell response prediction tasks—including drug response and perturbation effect estimation—Semi-Clipped achieves state-of-the-art performance, demonstrating superior generalization, robustness to input perturbations, and enhanced gene-level interpretability.

Technology Category

Application Category

📝 Abstract

Understanding cellular responses to stimuli is crucial for biological discovery and drug development. Transcriptomics provides interpretable, gene-level insights, while microscopy imaging offers rich predictive features but is harder to interpret. Weakly paired datasets, where samples share biological states, enable multimodal learning but are scarce, limiting their utility for training and multimodal inference. We propose a framework to enhance transcriptomics by distilling knowledge from microscopy images. Using weakly paired data, our method aligns and binds modalities, enriching gene expression representations with morphological information. To address data scarcity, we introduce (1) Semi-Clipped, an adaptation of CLIP for cross-modal distillation using pretrained foundation models, achieving state-of-the-art results, and (2) PEA (Perturbation Embedding Augmentation), a novel augmentation technique that enhances transcriptomics data while preserving inherent biological information. These strategies improve the predictive power and retain the interpretability of transcriptomics, enabling rich unimodal representations for complex biological tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing transcriptomics using microscopy image knowledge

Addressing data scarcity in weakly paired multimodal datasets

Improving predictive power while retaining interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal knowledge distillation from microscopy

Semi-Clipped adaptation for distillation

Perturbation Embedding Augmentation technique

🔎 Similar Papers

No similar papers found.