Network-based Neighborhood regression

📅 2024-07-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing statistical analyses of biological modules often neglect network topology, failing to simultaneously capture global community structure and local connectivity patterns. To address this, we propose a neighborhood regression framework that jointly models community-level architecture and node-wise topological neighborhoods—marking the first such integration for module-level regulatory inference. Our method employs community-guided least-squares optimization, grounded in random graph theory and modular network embedding, yielding a non-asymptotic error bound. It achieves linear consistency (i.e., convergence rate O(1/n), not O(1/√n)) and non-asymptotic minimax optimality, thereby theoretically guaranteeing valid statistical inference at the module level. Applied to autism whole-exome sequencing and RNA-seq data, the framework successfully identifies statistically significant regulatory associations between genetically variant gene modules and differentially expressed gene modules—demonstrating both biological interpretability and practical utility.

Technology Category

Application Category

📝 Abstract
Given the ubiquity of modularity in biological systems, module-level regulation analysis is vital for understanding biological systems across various levels and their dynamics. Current statistical analysis on biological modules predominantly focuses on either detecting the functional modules in biological networks or sub-group regression on the biological features without using the network data. This paper proposes a novel network-based neighborhood regression framework whose regression functions depend on both the global community-level information and local connectivity structures among entities. An efficient community-wise least square optimization approach is developed to uncover the strength of regulation among the network modules while enabling asymptotic inference. With random graph theory, we derive non-asymptotic estimation error bounds for the proposed estimator, achieving exact minimax optimality. Unlike the root-n consistency typical in canonical linear regression, our model exhibits linear consistency in the number of nodes n, highlighting the advantage of incorporating neighborhood information. The effectiveness of the proposed framework is further supported by extensive numerical experiments. Application to whole-exome sequencing and RNA-sequencing Autism datasets demonstrates the usage of the proposed method in identifying the association between the gene modules of genetic variations and the gene modules of genomic differential expressions.
Problem

Research questions and friction points this paper is trying to address.

Analyzes module-level regulation in biological networks using global and local connectivity.
Develops a network-based neighborhood regression framework for optimal regulation strength estimation.
Identifies associations between gene modules in genetic variations and differential expressions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Network-based neighborhood regression framework
Community-wise least square optimization
Random graph theory for error bounds
🔎 Similar Papers
No similar papers found.
Y
Yaoming Zhen
Department of Statistical Sciences, University of Toronto, Toronto, Ontario, M5G 1Z5, Canada
Jin-Hong Du
Jin-Hong Du
Carnegie Mellon University
high-dimensional statisticsoverparameterized learningsingle-cell data analysis