π€ AI Summary
This study investigates whether scale invariance constitutes a key mechanistic driver underlying the convergence of visual representations between artificial intelligence (AI) models and the human brain. Method: We introduce a multi-scale analytical framework to quantify dimensional stability and cross-scale structural similarity of neural representations, integrating multi-scale embedding analysis, fMRI-based neural alignment evaluation, and comparative assessment across large-scale vision-language models. Contribution/Results: (1) The degree of scale invariance in AI models exhibits a significant positive correlation with their fMRI alignment to human visual cortex; (2) increasing model capacity and incorporating multimodal pretraining enhance scale invariance, thereby improving neural alignment; (3) fMRI-derived neural manifolds display a characteristic concentration of feature decay at fine spatial scales. Collectively, these findings establish scale invariance as a unifying structural principle governing representational alignment between AI and biological vision, offering a novel theoretical perspective and a testable quantitative framework for investigating shared computational principles.
π Abstract
Despite variations in architecture and pretraining strategies, recent studies indicate that large-scale AI models often converge toward similar internal representations that also align with neural activity. We propose that scale-invariance, a fundamental structural principle in natural systems, is a key driver of this convergence. In this work, we propose a multi-scale analytical framework to quantify two core aspects of scale-invariance in AI representations: dimensional stability and structural similarity across scales. We further investigate whether these properties can predict alignment performance with functional Magnetic Resonance Imaging (fMRI) responses in the visual cortex. Our analysis reveals that embeddings with more consistent dimension and higher structural similarity across scales align better with fMRI data. Furthermore, we find that the manifold structure of fMRI data is more concentrated, with most features dissipating at smaller scales. Embeddings with similar scale patterns align more closely with fMRI data. We also show that larger pretraining datasets and the inclusion of language modalities enhance the scale-invariance properties of embeddings, further improving neural alignment. Our findings indicate that scale-invariance is a fundamental structural principle that bridges artificial and biological representations, providing a new framework for evaluating the structural quality of human-like AI systems.