🤖 AI Summary
Despite surpassing human accuracy in visual recognition, deep neural networks (DNNs) exhibit stagnating or declining neural and behavioral alignment with primate vision—challenging the prevailing assumption that improved AI performance inherently advances biological vision modeling.
Method: We conduct the first systematic evaluation of over 100 models across three cross-species benchmarks: fMRI response prediction in humans, single-neuron response fitting in macaque inferotemporal (IT) cortex, and human perceptual similarity judgments.
Contribution/Results: State-of-the-art models (e.g., ViT, ConvNeXt) achieve high task accuracy but demonstrate significantly weaker primate representational fidelity than mid-performing models—alignment degrades by up to 18% on key metrics. This reveals a non-monotonic relationship between AI benchmark performance and biological alignment, refuting the notion of automatic correspondence. We argue for a paradigm shift toward biologically constrained, mechanism-driven visual modeling—prioritizing neuroscientific validity over pure task optimization.
📝 Abstract
Deep neural networks (DNNs) once showed increasing alignment with primate perception and neural responses as they improved on vision benchmarks, raising hopes that advances in AI would yield better models of biological vision. However, we show across three benchmarks that this alignment is now plateauing - and in some cases worsening - as DNNs scale to human or superhuman accuracy. This divergence may reflect the adoption of visual strategies that differ from those used by primates. These findings challenge the view that progress in artificial intelligence will naturally translate to neuroscience. We argue that vision science must chart its own course, developing algorithms grounded in biological visual systems rather than optimizing for benchmarks based on internet-scale datasets.