🤖 AI Summary
Clinical differentiation between Crohn’s disease (CD) and ulcerative colitis (UC) remains challenging due to overlapping phenotypes and histopathological ambiguity. Method: We propose the first interpretable machine learning framework for inflammatory bowel disease (IBD) subtyping based on spatial transcriptomics. Using non-negative matrix factorization (NMF), we deconvolved cellular microenvironments and systematically quantified their composition, neighborhood enrichment, and spatially resolved gene expression signals, yielding 44 biologically meaningful spatial features. A multilayer perceptron (MLP) classifier was trained on these features, and interpretability analyses—e.g., feature attribution and spatial pattern mapping—were integrated to uncover underlying biological mechanisms. Contribution/Results: Our model achieves 0.774 accuracy in three-class classification (CD/UC/healthy) and 0.916 in inflammation-versus-healthy binary classification. Interpretability analysis identifies spatial architecture disruption—not merely cell-type abundance—as the key discriminative mechanism between CD and UC. The framework bridges high diagnostic performance with mechanistic biological insight, establishing a novel paradigm for precision IBD subtyping.
📝 Abstract
Differentiating between the two main subtypes of Inflammatory Bowel Disease (IBD): Crohns disease (CD) and ulcerative colitis (UC) is a persistent clinical challenge due to overlapping presentations. This study introduces a novel computational framework that employs spatial transcriptomics (ST) to create an explainable machine learning model for IBD classification. We analyzed ST data from the colonic mucosa of healthy controls (HC), UC, and CD patients. Using Non-negative Matrix Factorization (NMF), we first identified four recurring cellular niches, representing distinct functional microenvironments within the tissue. From these niches, we systematically engineered 44 features capturing three key aspects of tissue pathology: niche composition, neighborhood enrichment, and niche-gene signals. A multilayer perceptron (MLP) classifier trained on these features achieved an accuracy of 0.774 +/- 0.161 for the more challenging three-class problem (HC, UC, and CD) and 0.916 +/- 0.118 in the two-class problem of distinguishing IBD from healthy tissue. Crucially, model explainability analysis revealed that disruptions in the spatial organization of niches were the strongest predictors of general inflammation, while the classification between UC and CD relied on specific niche-gene expression signatures. This work provides a robust, proof-of-concept pipeline that transforms descriptive spatial data into an accurate and explainable predictive tool, offering not only a potential new diagnostic paradigm but also deeper insights into the distinct biological mechanisms that drive IBD subtypes.