Self-Attention-Based Contextual Modulation Improves Neural System Identification

📅 2024-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard CNNs struggle to model the contextual sensitivity—particularly peak tuning—of visual cortical neurons. Method: We introduce self-attention to explicitly capture non-local center-surround interactions. We propose “peak tuning” as a novel evaluation metric, combined with tuning curve correlation, to systematically quantify contextual modulation. Through receptive field decomposition and parameter-matched comparisons, we dissect functional specialization between local and surround information in tuning modeling. We demonstrate that self-attention can effectively replace late-stage convolutions and complements fully connected readout layers. Finally, we propose a staged learning paradigm for receptive field development and contextual modulation. Results: Experiments show our model significantly outperforms parameter-matched CNNs on both peak tuning and tuning curve correlation metrics, validating the critical role of surround information in modeling tuning peaks and enhancing the robustness of center-surround interactions.

Technology Category

Application Category

📝 Abstract
Convolutional neural networks (CNNs) have been shown to be state-of-the-art models for visual cortical neurons. Cortical neurons in the primary visual cortex are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs integrate global contextual information to model contextual modulation via two mechanisms: successive convolutions and a fully connected readout layer. In this paper, we find that self-attention (SA), an implementation of non-local network mechanisms, can improve neural response predictions over parameter-matched CNNs in two key metrics: tuning curve correlation and peak tuning. We introduce peak tuning as a metric to evaluate a model's ability to capture a neuron's top feature preference. We factorize networks to assess each context mechanism, revealing that information in the local receptive field is most important for modeling overall tuning, but surround information is critically necessary for characterizing the tuning peak. We find that self-attention can replace posterior spatial-integration convolutions when learned incrementally, and is further enhanced in the presence of a fully connected readout layer, suggesting that the two context mechanisms are complementary. Finally, we find that decomposing receptive field learning and contextual modulation learning in an incremental manner may be an effective and robust mechanism for learning surround-center interactions.
Problem

Research questions and friction points this paper is trying to address.

Improving neural response predictions using self-attention mechanisms.
Evaluating models' ability to capture neuron feature preferences.
Enhancing surround-center interactions through incremental learning strategies.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-attention improves neural response predictions.
Self-attention replaces spatial-integration convolutions incrementally.
Decomposing receptive field and contextual modulation enhances learning.
🔎 Similar Papers
No similar papers found.
I
Isaac Lin
Carnegie Mellon University
T
Tianye Wang
Peking University
S
Shang Gao
Massachusetts Institute of Technology
S
Shiming Tang
Peking University
Tai Sing Lee
Tai Sing Lee
Professor of Computer Science, Carnegie Mellon University
Computational NeuroscienceComputer Vision