🤖 AI Summary
Fine-grained few-shot learning (FGFSL) commonly assumes deeper backbones (e.g., ResNet12) are inherently superior, leading to the underutilization of shallow networks like ConvNet-4. Method: This paper challenges the “depth implies stronger representation” assumption and proposes the Location-Aware Constellation Network (LCN-4), a lightweight ConvNet-4 variant enhanced with explicit positional modeling. Its core innovations include: (1) a location-aware feature clustering module that incorporates spatial structural priors; and (2) a unified grid-based positional encoding compensation mechanism coupled with frequency-domain positional embeddings to recover absolute and relative spatial information lost in standard convolutions. Results: Evaluated on three fine-grained few-shot benchmarks, LCN-4 significantly outperforms existing ConvNet-4 baselines and matches or exceeds state-of-the-art ResNet12-based methods—demonstrating that shallow architectures, when augmented with principled position awareness, achieve competitive fine-grained discriminative power.
📝 Abstract
Deep learning has witnessed the extensive utilization across a wide spectrum of domains, including fine-grained few-shot learning (FGFSL) which heavily depends on deep backbones. Nonetheless, shallower deep backbones such as ConvNet-4, are not commonly preferred because they're prone to extract a larger quantity of non-abstract visual attributes. In this paper, we initially re-evaluate the relationship between network depth and the ability to fully encode few-shot instances, and delve into whether shallow deep architecture could effectuate comparable or superior performance to mainstream deep backbone. Fueled by the inspiration from vanilla ConvNet-4, we introduce a location-aware constellation network (LCN-4), equipped with a cutting-edge location-aware feature clustering module. This module can proficiently encoder and integrate spatial feature fusion, feature clustering, and recessive feature location, thereby significantly minimizing the overall loss. Specifically, we innovatively put forward a general grid position encoding compensation to effectively address the issue of positional information missing during the feature extraction process of specific ordinary convolutions. Additionally, we further propose a general frequency domain location embedding technique to offset for the location loss in clustering features. We have carried out validation procedures on three representative fine-grained few-shot benchmarks. Relevant experiments have established that LCN-4 notably outperforms the ConvNet-4 based State-of-the-Arts and achieves performance that is on par with or superior to most ResNet12-based methods, confirming the correctness of our conjecture.