CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the discreteness and heterogeneity of genome-wide copy number variations (CNVs) in tumor clonal evolution. We propose a two-stage probabilistic joint clustering framework: (1) a bipartite block model jointly clusters samples and genomic regions; (2) a subsequent stage disentangles clonal trunk from subclonal residual CNV patterns. Variational inference enables scalable optimization, accommodating both large cohorts and high-resolution data. Applied to The Cancer Genome Atlas (TCGA) low-grade glioma cohort, our model identifies clinically distinct molecular subtypes with significant survival differences and—uniquely—systematically characterizes structural features of subclonal residual CNVs. It achieves superior goodness-of-fit and subtype resolution compared to state-of-the-art methods. By explicitly modeling CNV heterogeneity across evolutionary time scales, this work establishes a novel paradigm for CNV-driven clonal evolution modeling.

Technology Category

Application Category

📝 Abstract

Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.

Problem

Research questions and friction points this paper is trying to address.

Clusters samples and genomic regions using discrete CNV states

Decomposes CNV data into primary and residual components

Identifies clinically relevant cancer subtypes for patient stratification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bipartite categorical block model for clustering

Two-stage decomposition of CNV data

Scalable variational inference algorithm

🔎 Similar Papers

No similar papers found.

Authors to Follow