🤖 AI Summary
This study addresses the performance bottleneck of brain imaging decoding in low-data regimes, primarily caused by the scarcity of labeled neural data. To overcome this limitation, the authors propose leveraging TRIBE v2, a large-scale multimodal fMRI pretraining model, to generate synthetic fMRI data for augmenting small real-world datasets. They systematically evaluate the impact of varying proportions of synthetic data on image decoding performance. Experiments on the 7T Natural Scenes and 3T BOLD5000 datasets demonstrate that incorporating synthetic data can improve top-10 image retrieval accuracy by up to 68%. Notably, decoders trained exclusively on synthetic data surpass random chance in certain settings, providing the first evidence that large-scale fMRI pretrained models can substantially enhance data efficiency in brain-to-image decoding and enable zero-shot decoding capabilities.
📝 Abstract
Brain decoding is limited by the availability of labeled neural data, and remains challenging in low-data regimes. To address this issue, we investigate whether and when brain decoding can be boosted by augmenting small fMRI datasets with synthetic data generated by a pretrained model of fMRI responses to stimuli. We use TRIBE v2, a large encoding model pretrained on more than 1000 hours of fMRI responses to video, audio and language. For each dataset, we evaluate systematic grids that show how the performance of image decoders varies with the amount of synthetic data used for training. Our results, based on two datasets (the 7T fMRI Natural Scenes Dataset and 3T fMRI BOLD5000), show up to 68% improvement in Top-10 image-retrieval accuracy compared to decoders trained only on real data. Importantly, the proportion of augmented data required to reach a given image decoding performance needs to be adjusted depending on the data source. Surprisingly, image decoders trained exclusively on synthetic fMRI can perform above chance in some settings, suggesting that TRIBE v2 can support zero-shot brain-to-image decoding. Together, these results show how large-scale models of the fMRI responses to sight, sound and language may provide a foundation to improve the data efficiency for image decoding.