Principal Components Decomposition of Fraction of Variance Explained in High Dimensional Linear Models with Strong Correlation

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This study addresses the substantial bias in existing proportion of variance explained (FVE) estimators—such as those from GWAS or LMM-REML—when predictors exhibit strong correlations in high-dimensional linear models. To mitigate this issue, the authors propose a two-component FVE estimation framework based on principal component decomposition, which partitions covariates into a low-dimensional subspace of strongly correlated variables and a high-dimensional complement of weakly correlated ones, each handled with tailored estimation strategies. The resulting estimator effectively reduces bias induced by high-dimensional strong correlation structures and enjoys desirable asymptotic consistency. Extensive simulations and real-data analysis using the ABCD neuroimaging cohort demonstrate that the proposed method significantly improves FVE estimation accuracy and more reliably captures heritability signals underlying cognitive phenotypes.

📝 Abstract

The fraction of variance explained (FVE) in a linear model quantifies the extent to which predictors account for outcome variability. In high-dimensional settings, where traditional FVE estimators do not apply, modern FVE estimators such as GWASH or linear mix-effect model estimated through the restricted maximum likelihood (LMM-REML) struggle with strong correlation among predictors, often found, for example, in brain imaging data. We propose a decomposition framework that partitions the FVE into two components: a low-dimensional component capturing the strong correlation, estimable by low dimensional methods, and a high-dimensional component with remaining weak correlation, estimable by high dimensional methods. Simulations demonstrate that decomposing dominant principal components (PCs) and estimating the high-dimensional FVE using GWASH or LMM-REML leads to improved bias reduction compared to directly applying standard approaches such as GWASH and LMM-REML. Our method shows consistent performance asymptotically as both the number of predictors and the number of samples increase. We illustrate the method in an analysis of the Adolescent Brain Cognitive Development (ABCD) brain imaging dataset, capturing nuanced heritability signals in the FVE of cognitive measures predicted by high-resolution brain imaging data.

Problem

Research questions and friction points this paper is trying to address.

fraction of variance explained

high-dimensional linear models

strong correlation

predictor correlation

FVE estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fraction of Variance Explained

Principal Components Decomposition

High-Dimensional Linear Models