🤖 AI Summary
To address the challenge of precise cross-modal (two-photon vs. fMOST) single-neuron matching under limited annotations, this paper proposes a few-shot robust recognition framework. Methodologically, it introduces: (1) a novel dual-channel attention mechanism that decouples somatic morphology from axonal/dendritic fiber context; (2) a joint optimization strategy combining MultiSimilarityMiner for hard-sample mining and Circle Loss to enhance discriminative feature learning; and (3) an integrated architecture incorporating a pretrained Vision Transformer backbone, gated feature fusion, and complementary local-global attention. Evaluated on real-world datasets, the method achieves significantly higher Top-K accuracy and recall compared to state-of-the-art approaches. Ablation studies and efficiency analysis confirm the effectiveness and training cost-effectiveness of each component. This work establishes a scalable technical paradigm for multimodal structure–function correlation analysis in neuroscience.
📝 Abstract
In neuroscience research, achieving single-neuron matching across different imaging modalities is critical for understanding the relationship between neuronal structure and function. However, modality gaps and limited annotations present significant challenges. We propose a few-shot metric learning method with a dual-channel attention mechanism and a pretrained vision transformer to enable robust cross-modal neuron identification. The local and global channels extract soma morphology and fiber context, respectively, and a gating mechanism fuses their outputs. To enhance the model's fine-grained discrimination capability, we introduce a hard sample mining strategy based on the MultiSimilarityMiner algorithm, along with the Circle Loss function. Experiments on two-photon and fMOST datasets demonstrate superior Top-K accuracy and recall compared to existing methods. Ablation studies and t-SNE visualizations validate the effectiveness of each module. The method also achieves a favorable trade-off between accuracy and training efficiency under different fine-tuning strategies. These results suggest that the proposed approach offers a promising technical solution for accurate single-cell level matching and multimodal neuroimaging integration.