🤖 AI Summary
This study addresses the discreteness and heterogeneity of genome-wide copy number variations (CNVs) in tumor clonal evolution. We propose a two-stage probabilistic joint clustering framework: (1) a bipartite block model jointly clusters samples and genomic regions; (2) a subsequent stage disentangles clonal trunk from subclonal residual CNV patterns. Variational inference enables scalable optimization, accommodating both large cohorts and high-resolution data. Applied to The Cancer Genome Atlas (TCGA) low-grade glioma cohort, our model identifies clinically distinct molecular subtypes with significant survival differences and—uniquely—systematically characterizes structural features of subclonal residual CNVs. It achieves superior goodness-of-fit and subtype resolution compared to state-of-the-art methods. By explicitly modeling CNV heterogeneity across evolutionary time scales, this work establishes a novel paradigm for CNV-driven clonal evolution modeling.
📝 Abstract
Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.