CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

📅 2025-06-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the discreteness and heterogeneity of genome-wide copy number variations (CNVs) in tumor clonal evolution. We propose a two-stage probabilistic joint clustering framework: (1) a bipartite block model jointly clusters samples and genomic regions; (2) a subsequent stage disentangles clonal trunk from subclonal residual CNV patterns. Variational inference enables scalable optimization, accommodating both large cohorts and high-resolution data. Applied to The Cancer Genome Atlas (TCGA) low-grade glioma cohort, our model identifies clinically distinct molecular subtypes with significant survival differences and—uniquely—systematically characterizes structural features of subclonal residual CNVs. It achieves superior goodness-of-fit and subtype resolution compared to state-of-the-art methods. By explicitly modeling CNV heterogeneity across evolutionary time scales, this work establishes a novel paradigm for CNV-driven clonal evolution modeling.

Technology Category

Application Category

📝 Abstract
Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.
Problem

Research questions and friction points this paper is trying to address.

Clusters samples and genomic regions using discrete CNV states
Decomposes CNV data into primary and residual components
Identifies clinically relevant cancer subtypes for patient stratification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bipartite categorical block model for clustering
Two-stage decomposition of CNV data
Scalable variational inference algorithm
🔎 Similar Papers
No similar papers found.
K
Kevin Lam
Department of Statistics, University of British Columbia
W
William Daniels
Department of Molecular Oncology, BC Cancer Research Centre
J
J Maxwell Douglas
Department of Molecular Oncology, BC Cancer Research Centre
D
Daniel Lai
Department of Molecular Oncology, BC Cancer Research Centre
Samuel Aparicio
Samuel Aparicio
UBC, BC Cancer Agency, Vancouver
genomicscancerevolutionsingle cell biologycomputational biology
Benjamin Bloem-Reddy
Benjamin Bloem-Reddy
University of British Columbia
StatisticsMachine LearningApplied Probability
Y
Yongjin Park
Department of Statistics, University of British Columbia