🤖 AI Summary
This study addresses the computational and interpretability challenges in joint modeling of high-dimensional predictors and responses, where existing approaches typically reduce dimensionality only on the predictor side. To overcome these limitations, this work proposes the Graph-Independent Dual Screening (GIDS) framework, which enables efficient simultaneous dimension reduction for both predictors and responses for the first time, accompanied by a theoretically guaranteed statistical screening algorithm. The GIDS method substantially enhances computational efficiency, model scalability, and result interpretability. Empirical evaluations demonstrate its superior performance over state-of-the-art methods in simulations. Applied to ADNI data, GIDS successfully reduces 860,000 CpG sites and 49,000 transcripts to approximately 9,000 and 2,000 features, respectively, revealing block-wise CpG–gene interaction structures associated with Alzheimer’s disease.
📝 Abstract
Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.