🤖 AI Summary
It remains unclear whether representational similarities between artificial vision models and human visual cortex arise from shared architectural priors or from universal image-processing principles. Method: Using representational similarity analysis (RSA) and fMRI response modeling, we systematically compared intermediate representations across hundreds of heterogeneous deep visual models. Contribution/Results: We discover that diverse models converge onto a compact set of fewer than ten highly generalizable representational dimensions—stable across architectures and vision tasks, and significantly better aligned with human V1–IT fMRI responses than model-specific representations. Remarkably, only eight such dimensions preserve over 90% of the model–brain representational similarity. These findings indicate that deep representational alignment between artificial and biological vision stems not from architectural idiosyncrasies but from shared, universal structural priors in image representation. This provides a new paradigm for uncovering fundamental principles of visual intelligence.
📝 Abstract
Do neural network models of vision learn brain-aligned representations because they share architectural constraints and task objectives with biological vision or because they learn universal features of natural image processing? We characterized the universality of hundreds of thousands of representational dimensions from visual neural networks with varied construction. We found that networks with varied architectures and task objectives learn to represent natural images using a shared set of latent dimensions, despite appearing highly distinct at a surface level. Next, by comparing these networks with human brain representations measured with fMRI, we found that the most brain-aligned representations in neural networks are those that are universal and independent of a network's specific characteristics. Remarkably, each network can be reduced to fewer than ten of its most universal dimensions with little impact on its representational similarity to the human brain. These results suggest that the underlying similarities between artificial and biological vision are primarily governed by a core set of universal image representations that are convergently learned by diverse systems.