🤖 AI Summary
This study addresses the substantial bias in existing proportion of variance explained (FVE) estimators—such as those from GWAS or LMM-REML—when predictors exhibit strong correlations in high-dimensional linear models. To mitigate this issue, the authors propose a two-component FVE estimation framework based on principal component decomposition, which partitions covariates into a low-dimensional subspace of strongly correlated variables and a high-dimensional complement of weakly correlated ones, each handled with tailored estimation strategies. The resulting estimator effectively reduces bias induced by high-dimensional strong correlation structures and enjoys desirable asymptotic consistency. Extensive simulations and real-data analysis using the ABCD neuroimaging cohort demonstrate that the proposed method significantly improves FVE estimation accuracy and more reliably captures heritability signals underlying cognitive phenotypes.
📝 Abstract
The fraction of variance explained (FVE) in a linear model quantifies the extent to which predictors account for outcome variability. In high-dimensional settings, where traditional FVE estimators do not apply, modern FVE estimators such as GWASH or linear mix-effect model estimated through the restricted maximum likelihood (LMM-REML) struggle with strong correlation among predictors, often found, for example, in brain imaging data. We propose a decomposition framework that partitions the FVE into two components: a low-dimensional component capturing the strong correlation, estimable by low dimensional methods, and a high-dimensional component with remaining weak correlation, estimable by high dimensional methods. Simulations demonstrate that decomposing dominant principal components (PCs) and estimating the high-dimensional FVE using GWASH or LMM-REML leads to improved bias reduction compared to directly applying standard approaches such as GWASH and LMM-REML. Our method shows consistent performance asymptotically as both the number of predictors and the number of samples increase. We illustrate the method in an analysis of the Adolescent Brain Cognitive Development (ABCD) brain imaging dataset, capturing nuanced heritability signals in the FVE of cognitive measures predicted by high-resolution brain imaging data.