scCBGM: Interpretable Single-Cell Counterfactual Editing

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge posed by the combinatorial explosion in single-cell perturbation experiments, which hinders comprehensive exploration of cellular phenotypic mechanisms. To overcome this limitation, the authors propose the single-cell Concept Bottleneck Generative Model (scCBGM), the first adaptation of concept bottleneck architectures to single-cell data. By incorporating decoder skip connections and a cross-covariance penalty, scCBGM achieves disentangled representations without dimensional constraints and extends naturally to a flow-matching framework for precise counterfactual generation and editing. The method demonstrates strong compositional generalization and counterfactual prediction capabilities across multiple real-world datasets, with efficacy validated through both cell-level synthetic benchmarks—featuring ground-truth counterfactual labels—and population-level experimental data.
📝 Abstract
Understanding cellular phenotypes and how they respond to perturbations is critical for disease biology and therapeutic design. Single-cell RNA sequencing enables characterization at cellular resolution, yet the combinatorial space of conditions makes exhaustive experimental mapping infeasible. We introduce single-cell Concept Bottleneck Generative Models (scCBGM), a framework for interpretable and precise counterfactual editing of individual cells. scCBGM adapts concept bottleneck architectures for single-cell data through decoder skip connections and a cross-covariance penalty that promotes disentanglement without dimensional constraints. We extend the framework to flow matching models, enabling concept-guided editing in both encoding-decoding and generation regimes. To enable rigorous evaluation, we develop a synthetic benchmark with ground-truth counterfactuals. Across multiple real datasets, scCBGM demonstrates superior performance in combinatorial generalization and counterfactual prediction, supported by cell-level validation on synthetic data and population-level benchmarks on real datasets.
Problem

Research questions and friction points this paper is trying to address.

single-cell
counterfactual editing
cellular phenotypes
perturbations
combinatorial generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept bottleneck
counterfactual editing
single-cell RNA sequencing
disentanglement
flow matching
🔎 Similar Papers
2024-03-29arXiv.orgCitations: 4