Enhancing Few-Shot Image Classification through Learnable Multi-Scale Embedding and Attention Mechanisms

📅 2024-09-12

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

151K/year

🤖 AI Summary

To address the limitations of single-scale feature matching in few-shot image classification—namely, loss of fine-grained details and insufficient generalization—this paper proposes a learnable multi-scale embedding framework. It employs a multi-output CNN to simultaneously extract shallow-level discriminative details and deep-level semantic features; introduces a staged self-attention mechanism to enhance cross-layer feature alignment; and integrates a learnable scale-weighting module for dynamic multi-scale fusion. Notably, this work is the first to jointly leverage staged self-attention and adaptive scale weighting for few-shot feature alignment, significantly improving prototype matching robustness and cross-domain generalization. The method achieves state-of-the-art performance on 5-way 1-shot and 5-shot tasks of MiniImageNet and FC100. Furthermore, extensive cross-domain evaluation across eight benchmark datasets demonstrates superior overall performance.

Technology Category

Application Category

📝 Abstract

In the context of few-shot classification, the goal is to train a classifier using a limited number of samples while maintaining satisfactory performance. However, traditional metric-based methods exhibit certain limitations in achieving this objective. These methods typically rely on a single distance value between the query feature and support feature, thereby overlooking the contribution of shallow features. To overcome this challenge, we propose a novel approach in this paper. Our approach involves utilizing a multi-output embedding network that maps samples into distinct feature spaces. The proposed method extracts feature vectors at different stages, enabling the model to capture both global and abstract features. By utilizing these diverse feature spaces, our model enhances its performance. Moreover, employing a self-attention mechanism improves the refinement of features at each stage, leading to even more robust representations and improved overall performance. Furthermore, assigning learnable weights to each stage significantly improved performance and results. We conducted comprehensive evaluations on the MiniImageNet and FC100 datasets, specifically in the 5-way 1-shot and 5-way 5-shot scenarios. Additionally, we performed cross-domain tasks across eight benchmark datasets, achieving high accuracy in the testing domains. These evaluations demonstrate the efficacy of our proposed method in comparison to state-of-the-art approaches. https://github.com/FatemehAskari/MSENet

Problem

Research questions and friction points this paper is trying to address.

Few-shot Learning

Image Classification

Information Retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-scale embeddings

self-attention mechanisms

learnable weights

🔎 Similar Papers

No similar papers found.