The Variance Brain Foundation Models Forgot: Third-Order Statistics Predict Cognition Where Billion-Parameter Models Fail

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Although current brain foundation models (BFMs) possess large parameter scales, they struggle to effectively predict individual cognitive abilities—often underperforming even simple linear regression based on functional connectivity (FC). This work reveals for the first time that BFMs overlook fMRI third-order coskewness structures during pretraining, which are closely linked to cognition. To address this, we propose a GPU-free, non-pretraining linear approach that constructs a subspace preserving coskewness information via cumulant analysis and integrates it with FC matrices within a linear regression framework. Our method consistently outperforms existing BFMs and baseline FC approaches across multiple datasets and cortical parcellation schemes. Moreover, with targeted fine-tuning, it recovers the upper-bound FC performance of BrainLM using only a single forward pass.

📝 Abstract

Brain foundation models (BFMs) are self-supervised Transformers pretrained on fMRI data. We posit that these models should capture each subject's cognitive performance from their fMRI signal. Yet across three state-of-the-art BFMs and every readout we test, they predict cognition worse than a linear regression from the $\sim$80K parameters of the functional connectivity matrix (FC). The gap widens with scale: BrainLM's 650M model predicts cognition worse than its 111M. We attribute this to a \textbf{variance allocation problem}: BFM pretraining captures the variance components that dominate fMRI but not the higher-order structure that predicts cognition. Our per-cumulant analysis of the reconstructed signal shows that the second-order covariance is partially preserved, while the third-order co-skewness tensor is largely destroyed. To recover what BFMs lose, we design a linear pipeline that projects the fMRI signal into the subspace that best preserves its co-skewness and computes FC there. This \textbf{exceeds raw FC and every pretrained BFM} on every dataset and parcellation we test, outperforming prior state-of-the-art under controlled evaluation \textbf{with no pretraining and no GPU}. We \textbf{recover the raw-FC ceiling on BrainLM's forward pass} by finetuning with a loss targeted at this same subspace. This shows that the bottleneck is the pretraining objective, not the architecture or the model size.

Problem

Research questions and friction points this paper is trying to address.

brain foundation models

cognitive prediction

higher-order statistics

functional connectivity

variance allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

third-order statistics

co-skewness

variance allocation problem